mirror of
https://github.com/php/php-src.git
synced 2026-04-10 09:33:06 +02:00
Added PCRE 3.01.
This commit is contained in:
242
ext/pcre/pcrelib/doc/Tech.Notes
Normal file
242
ext/pcre/pcrelib/doc/Tech.Notes
Normal file
@@ -0,0 +1,242 @@
|
||||
Technical Notes about PCRE
|
||||
--------------------------
|
||||
|
||||
Many years ago I implemented some regular expression functions to an algorithm
|
||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
||||
restricted in what they could do by comparison with Perl. The interesting part
|
||||
about the algorithm was that the amount of space required to hold the compiled
|
||||
form of an expression was known in advance. The code to apply an expression did
|
||||
not operate by backtracking, as the Henry Spencer and Perl code does, but
|
||||
instead checked all possibilities simultaneously by keeping a list of current
|
||||
states and checking all of them as it advanced through the subject string. (In
|
||||
the terminology of Jeffrey Friedl's book, it was a "DFA algorithm".) When the
|
||||
pattern was all used up, all remaining states were possible matches, and the
|
||||
one matching the longest subset of the subject string was chosen. This did not
|
||||
necessarily maximize the individual wild portions of the pattern, as is
|
||||
expected in Unix and Perl-style regular expressions.
|
||||
|
||||
By contrast, the code originally written by Henry Spencer and subsequently
|
||||
heavily modified for Perl actually compiles the expression twice: once in a
|
||||
dummy mode in order to find out how much store will be needed, and then for
|
||||
real. The execution function operates by backtracking and maximizing (or,
|
||||
optionally, minimizing in Perl) the amount of the subject that matches
|
||||
individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's
|
||||
terminology.
|
||||
|
||||
For the set of functions that forms PCRE (which are unrelated to those
|
||||
mentioned above), I tried at first to invent an algorithm that used an amount
|
||||
of store bounded by a multiple of the number of characters in the pattern, to
|
||||
save on compiling time. However, because of the greater complexity in Perl
|
||||
regular expressions, I couldn't do this. In any case, a first pass through the
|
||||
pattern is needed, in order to find internal flag settings like (?i) at top
|
||||
level. So PCRE works by running a very degenerate first pass to calculate a
|
||||
maximum store size, and then a second pass to do the real compile - which may
|
||||
use a bit less than the predicted amount of store. The idea is that this is
|
||||
going to turn out faster because the first pass is degenerate and the second
|
||||
pass can just store stuff straight into the vector. It does make the compiling
|
||||
functions bigger, of course, but they have got quite big anyway to handle all
|
||||
the Perl stuff.
|
||||
|
||||
The compiled form of a pattern is a vector of bytes, containing items of
|
||||
variable length. The first byte in an item is an opcode, and the length of the
|
||||
item is either implicit in the opcode or contained in the data bytes which
|
||||
follow it. A list of all the opcodes follows:
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_SOD match start of data: \A
|
||||
OP_CIRC ^ (start of data, or after \n in multiline)
|
||||
OP_NOT_WORD_BOUNDARY \W
|
||||
OP_WORD_BOUNDARY \w
|
||||
OP_NOT_DIGIT \D
|
||||
OP_DIGIT \d
|
||||
OP_NOT_WHITESPACE \S
|
||||
OP_WHITESPACE \s
|
||||
OP_NOT_WORDCHAR \W
|
||||
OP_WORDCHAR \w
|
||||
OP_EODN match end of data or \n at end: \Z
|
||||
OP_EOD match end of data: \z
|
||||
OP_DOLL $ (end of data, or before \n in multiline)
|
||||
OP_RECURSE match the pattern recursively
|
||||
|
||||
|
||||
Repeating single characters
|
||||
---------------------------
|
||||
|
||||
The common repeats (*, +, ?) when applied to a single character appear as
|
||||
two-byte items using the following opcodes:
|
||||
|
||||
OP_STAR
|
||||
OP_MINSTAR
|
||||
OP_PLUS
|
||||
OP_MINPLUS
|
||||
OP_QUERY
|
||||
OP_MINQUERY
|
||||
|
||||
Those with "MIN" in their name are the minimizing versions. Each is followed by
|
||||
the character that is to be repeated. Other repeats make use of
|
||||
|
||||
OP_UPTO
|
||||
OP_MINUPTO
|
||||
OP_EXACT
|
||||
|
||||
which are followed by a two-byte count (most significant first) and the
|
||||
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
|
||||
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
|
||||
OP_UPTO (or OP_MINUPTO).
|
||||
|
||||
|
||||
Repeating character types
|
||||
-------------------------
|
||||
|
||||
Repeats of things like \d are done exactly as for single characters, except
|
||||
that instead of a character, the opcode for the type is stored in the data
|
||||
byte. The opcodes are:
|
||||
|
||||
OP_TYPESTAR
|
||||
OP_TYPEMINSTAR
|
||||
OP_TYPEPLUS
|
||||
OP_TYPEMINPLUS
|
||||
OP_TYPEQUERY
|
||||
OP_TYPEMINQUERY
|
||||
OP_TYPEUPTO
|
||||
OP_TYPEMINUPTO
|
||||
OP_TYPEEXACT
|
||||
|
||||
|
||||
Matching a character string
|
||||
---------------------------
|
||||
|
||||
The OP_CHARS opcode is followed by a one-byte count and then that number of
|
||||
characters. If there are more than 255 characters in sequence, successive
|
||||
instances of OP_CHARS are used.
|
||||
|
||||
|
||||
Character classes
|
||||
-----------------
|
||||
|
||||
OP_CLASS is used for a character class, provided there are at least two
|
||||
characters in the class. If there is only one character, OP_CHARS is used for a
|
||||
positive class, and OP_NOT for a negative one (that is, for something like
|
||||
[^a]). Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a
|
||||
repeated, negated, single-character class. The normal ones (OP_STAR etc.) are
|
||||
used for a repeated positive single-character class.
|
||||
|
||||
OP_CLASS is followed by a 32-byte bit map containing a 1 bit for every
|
||||
character that is acceptable. The bits are counted from the least significant
|
||||
end of each byte.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
||||
OP_REF is followed by a single byte containing the reference number.
|
||||
|
||||
|
||||
Repeating character classes and back references
|
||||
-----------------------------------------------
|
||||
|
||||
Single-character classes are handled specially (see above). This applies to
|
||||
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
|
||||
item. The matching code looks at the following opcode to see if it is one of
|
||||
|
||||
OP_CRSTAR
|
||||
OP_CRMINSTAR
|
||||
OP_CRPLUS
|
||||
OP_CRMINPLUS
|
||||
OP_CRQUERY
|
||||
OP_CRMINQUERY
|
||||
OP_CRRANGE
|
||||
OP_CRMINRANGE
|
||||
|
||||
All but the last two are just single-byte items. The others are followed by
|
||||
four bytes of data, comprising the minimum and maximum repeat counts.
|
||||
|
||||
|
||||
Brackets and alternation
|
||||
------------------------
|
||||
|
||||
A pair of non-capturing (round) brackets is wrapped round each expression at
|
||||
compile time, so alternation always happens in the context of brackets.
|
||||
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
|
||||
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
|
||||
speakers, including myself, can be round, square, curly, or pointy. Hence this
|
||||
usage.]
|
||||
|
||||
A bracket opcode is followed by two bytes which give the offset to the next
|
||||
alternative OP_ALT or, if there aren't any branches, to the matching KET
|
||||
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
|
||||
or to the KET opcode.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, while
|
||||
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
|
||||
maximally respectively. All three are followed by two bytes giving (as a
|
||||
positive number) the offset back to the matching BRA opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with a BRAZERO if the
|
||||
minimum is zero), with the final copy terminating with a KETRMIN or KETRMAX as
|
||||
appropriate.
|
||||
|
||||
A subpattern with a bounded maximum repetition is replicated in a nested
|
||||
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
|
||||
each replication after the minimum, so that, for example, (abc){2,5} is
|
||||
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not
|
||||
apply to these internally generated brackets.
|
||||
|
||||
|
||||
Assertions
|
||||
----------
|
||||
|
||||
Forward assertions are just like other subpatterns, but starting with one of
|
||||
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||
is OP_REVERSE, followed by a two byte count of the number of characters to move
|
||||
back the pointer in the subject string. A separate count is present in each
|
||||
alternative of a lookbehind assertion, allowing them to have different fixed
|
||||
lengths.
|
||||
|
||||
|
||||
Once-only subpatterns
|
||||
---------------------
|
||||
|
||||
These are also just like other subpatterns, but they start with the opcode
|
||||
OP_ONCE.
|
||||
|
||||
|
||||
Conditional subpatterns
|
||||
-----------------------
|
||||
|
||||
These are like other subpatterns, but they start with the opcode OP_COND. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by one byte containing the
|
||||
reference number. Otherwise, a conditional subpattern will always start with
|
||||
one of the assertions.
|
||||
|
||||
|
||||
Changing options
|
||||
----------------
|
||||
|
||||
If any of the /i, /m, or /s options are changed within a parenthesized group,
|
||||
an OP_OPT opcode is compiled, followed by one byte containing the new settings
|
||||
of these flags. If there are several alternatives in a group, there is an
|
||||
occurrence of OP_OPT at the start of all those following the first options
|
||||
change, to set appropriate options for the start of the alternative.
|
||||
Immediately after the end of the group there is another such item to reset the
|
||||
flags to their previous values. Other changes of flag within the pattern can be
|
||||
handled entirely at compile time, and so do not cause anything to be put into
|
||||
the compiled data.
|
||||
|
||||
|
||||
Philip Hazel
|
||||
February 2000
|
||||
1702
ext/pcre/pcrelib/doc/pcre.3
Normal file
1702
ext/pcre/pcrelib/doc/pcre.3
Normal file
File diff suppressed because it is too large
Load Diff
2259
ext/pcre/pcrelib/doc/pcre.html
Normal file
2259
ext/pcre/pcrelib/doc/pcre.html
Normal file
File diff suppressed because it is too large
Load Diff
1978
ext/pcre/pcrelib/doc/pcre.txt
Normal file
1978
ext/pcre/pcrelib/doc/pcre.txt
Normal file
File diff suppressed because it is too large
Load Diff
141
ext/pcre/pcrelib/doc/pcreposix.3
Normal file
141
ext/pcre/pcrelib/doc/pcreposix.3
Normal file
@@ -0,0 +1,141 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
pcreposix - POSIX API for Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
|
||||
.PP
|
||||
.br
|
||||
.B void regfree(regex_t *\fIpreg\fR);
|
||||
|
||||
|
||||
.SH DESCRIPTION
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the \fBpcre\fR documentation for a description of the native API,
|
||||
which contains additional functionality.
|
||||
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the native API. Their prototypes are defined in the \fBpcreposix.h\fR header
|
||||
file, and on Unix systems the library itself is called \fBpcreposix.a\fR, so
|
||||
can be accessed by adding \fB-lpcreposix\fR to the command for linking an
|
||||
application which uses them. Because the POSIX functions call the native ones,
|
||||
it is also necessary to add \fR-lpcre\fR.
|
||||
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
|
||||
with the value zero. They have no effect, but since programs that are written
|
||||
to the POSIX interface often use them, this makes it easier to slot in PCRE as
|
||||
a replacement library. Other POSIX options are not even defined.
|
||||
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below.
|
||||
|
||||
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
|
||||
structure types, \fIregex_t\fR for compiled internal forms, and
|
||||
\fIregmatch_t\fR for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
|
||||
|
||||
.SH COMPILING A PATTERN
|
||||
|
||||
The function \fBregcomp()\fR is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
|
||||
The argument \fIcflags\fR is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
|
||||
REG_ICASE
|
||||
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
|
||||
REG_NEWLINE
|
||||
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
|
||||
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fR structure is filled in on success, and one member of the structure
|
||||
is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
|
||||
|
||||
.SH MATCHING A PATTERN
|
||||
The function \fBregexec()\fR is called to match a pre-compiled pattern
|
||||
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
|
||||
subject to the options in \fIeflags\fR. These can be:
|
||||
|
||||
REG_NOTBOL
|
||||
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
REG_NOTEOL
|
||||
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the \fIpmatch\fR argument, which points to an array of
|
||||
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
|
||||
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of \fIstring\fR that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
|
||||
|
||||
.SH ERROR MESSAGES
|
||||
The \fBregerror()\fR function maps a non-zero errorcode from either
|
||||
\fBregcomp\fR or \fBregexec\fR to a printable message. If \fIpreg\fR is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
|
||||
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
|
||||
|
||||
.SH STORAGE
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
|
||||
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
New Museums Site,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.br
|
||||
Phone: +44 1223 334714
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
182
ext/pcre/pcrelib/doc/pcreposix.html
Normal file
182
ext/pcre/pcrelib/doc/pcreposix.html
Normal file
@@ -0,0 +1,182 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>pcreposix specification</TITLE>
|
||||
</HEAD>
|
||||
<body bgcolor="#FFFFFF" text="#00005A">
|
||||
<H1>pcreposix specification</H1>
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page in case the
|
||||
conversion went wrong.
|
||||
<UL>
|
||||
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
|
||||
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
|
||||
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
|
||||
<LI><A NAME="TOC4" HREF="#SEC4">COMPILING A PATTERN</A>
|
||||
<LI><A NAME="TOC5" HREF="#SEC5">MATCHING A PATTERN</A>
|
||||
<LI><A NAME="TOC6" HREF="#SEC6">ERROR MESSAGES</A>
|
||||
<LI><A NAME="TOC7" HREF="#SEC7">STORAGE</A>
|
||||
<LI><A NAME="TOC8" HREF="#SEC8">AUTHOR</A>
|
||||
</UL>
|
||||
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
|
||||
<P>
|
||||
pcreposix - POSIX API for Perl-compatible regular expressions.
|
||||
</P>
|
||||
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
|
||||
<P>
|
||||
<B>#include <pcreposix.h></B>
|
||||
</P>
|
||||
<P>
|
||||
<B>int regcomp(regex_t *<I>preg</I>, const char *<I>pattern</I>,</B>
|
||||
<B>int <I>cflags</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>int regexec(regex_t *<I>preg</I>, const char *<I>string</I>,</B>
|
||||
<B>size_t <I>nmatch</I>, regmatch_t <I>pmatch</I>[], int <I>eflags</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>size_t regerror(int <I>errcode</I>, const regex_t *<I>preg</I>,</B>
|
||||
<B>char *<I>errbuf</I>, size_t <I>errbuf_size</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>void regfree(regex_t *<I>preg</I>);</B>
|
||||
</P>
|
||||
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
|
||||
<P>
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the <B>pcre</B> documentation for a description of the native API,
|
||||
which contains additional functionality.
|
||||
</P>
|
||||
<P>
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the native API. Their prototypes are defined in the <B>pcreposix.h</B> header
|
||||
file, and on Unix systems the library itself is called <B>pcreposix.a</B>, so
|
||||
can be accessed by adding <B>-lpcreposix</B> to the command for linking an
|
||||
application which uses them. Because the POSIX functions call the native ones,
|
||||
it is also necessary to add \fR-lpcre\fR.
|
||||
</P>
|
||||
<P>
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
|
||||
with the value zero. They have no effect, but since programs that are written
|
||||
to the POSIX interface often use them, this makes it easier to slot in PCRE as
|
||||
a replacement library. Other POSIX options are not even defined.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below.
|
||||
</P>
|
||||
<P>
|
||||
The header for these functions is supplied as <B>pcreposix.h</B> to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as <B>regex.h</B>, which is the "correct" name. It provides two
|
||||
structure types, <I>regex_t</I> for compiled internal forms, and
|
||||
<I>regmatch_t</I> for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
</P>
|
||||
<LI><A NAME="SEC4" HREF="#TOC1">COMPILING A PATTERN</A>
|
||||
<P>
|
||||
The function <B>regcomp()</B> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <I>pattern</I>. The <I>preg</I> argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
</P>
|
||||
<P>
|
||||
The argument <I>cflags</I> is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_ICASE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NEWLINE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
</P>
|
||||
<P>
|
||||
The yield of <B>regcomp()</B> is zero on success, and non-zero otherwise. The
|
||||
<I>preg</I> structure is filled in on success, and one member of the structure
|
||||
is publicized: <I>re_nsub</I> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
</P>
|
||||
<LI><A NAME="SEC5" HREF="#TOC1">MATCHING A PATTERN</A>
|
||||
<P>
|
||||
The function <B>regexec()</B> is called to match a pre-compiled pattern
|
||||
<I>preg</I> against a given <I>string</I>, which is terminated by a zero byte,
|
||||
subject to the options in <I>eflags</I>. These can be:
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NOTBOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NOTEOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the <I>pmatch</I> argument, which points to an array of
|
||||
<I>nmatch</I> structures of type <I>regmatch_t</I>, containing the members
|
||||
<I>rm_so</I> and <I>rm_eo</I>. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of <I>string</I> that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
</P>
|
||||
<P>
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
</P>
|
||||
<LI><A NAME="SEC6" HREF="#TOC1">ERROR MESSAGES</A>
|
||||
<P>
|
||||
The <B>regerror()</B> function maps a non-zero errorcode from either
|
||||
<B>regcomp</B> or <B>regexec</B> to a printable message. If <I>preg</I> is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in <I>errbuf</I>. The length of the
|
||||
message, including the zero, is limited to <I>errbuf_size</I>. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
</P>
|
||||
<LI><A NAME="SEC7" HREF="#TOC1">STORAGE</A>
|
||||
<P>
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the <I>preg</I> structure. The function <B>regfree()</B> frees all such
|
||||
memory, after which <I>preg</I> may no longer be used as a compiled expression.
|
||||
</P>
|
||||
<LI><A NAME="SEC8" HREF="#TOC1">AUTHOR</A>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<BR>
|
||||
University Computing Service,
|
||||
<BR>
|
||||
New Museums Site,
|
||||
<BR>
|
||||
Cambridge CB2 3QG, England.
|
||||
<BR>
|
||||
Phone: +44 1223 334714
|
||||
</P>
|
||||
<P>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
150
ext/pcre/pcrelib/doc/pcreposix.txt
Normal file
150
ext/pcre/pcrelib/doc/pcreposix.txt
Normal file
@@ -0,0 +1,150 @@
|
||||
NAME
|
||||
pcreposix - POSIX API for Perl-compatible regular expres-
|
||||
sions.
|
||||
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
#include <pcreposix.h>
|
||||
|
||||
int regcomp(regex_t *preg, const char *pattern,
|
||||
int cflags);
|
||||
|
||||
int regexec(regex_t *preg, const char *string,
|
||||
size_t nmatch, regmatch_t pmatch[], int eflags);
|
||||
|
||||
size_t regerror(int errcode, const regex_t *preg,
|
||||
char *errbuf, size_t errbuf_size);
|
||||
|
||||
void regfree(regex_t *preg);
|
||||
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
This set of functions provides a POSIX-style API to the PCRE
|
||||
regular expression package. See the pcre documentation for a
|
||||
description of the native API, which contains additional
|
||||
functionality.
|
||||
|
||||
The functions described here are just wrapper functions that
|
||||
ultimately call the native API. Their prototypes are defined
|
||||
in the pcreposix.h header file, and on Unix systems the
|
||||
library itself is called pcreposix.a, so can be accessed by
|
||||
adding -lpcreposix to the command for linking an application
|
||||
which uses them. Because the POSIX functions call the native
|
||||
ones, it is also necessary to add -lpcre.
|
||||
|
||||
I have implemented only those option bits that can be rea-
|
||||
sonably mapped to PCRE native options. In addition, the
|
||||
options REG_EXTENDED and REG_NOSUB are defined with the
|
||||
value zero. They have no effect, but since programs that are
|
||||
written to the POSIX interface often use them, this makes it
|
||||
easier to slot in PCRE as a replacement library. Other POSIX
|
||||
options are not even defined.
|
||||
|
||||
When PCRE is called via these functions, it is only the API
|
||||
that is POSIX-like in style. The syntax and semantics of the
|
||||
regular expressions themselves are still those of Perl, sub-
|
||||
ject to the setting of various PCRE options, as described
|
||||
below.
|
||||
|
||||
The header for these functions is supplied as pcreposix.h to
|
||||
avoid any potential clash with other POSIX libraries. It
|
||||
can, of course, be renamed or aliased as regex.h, which is
|
||||
the "correct" name. It provides two structure types, regex_t
|
||||
for compiled internal forms, and regmatch_t for returning
|
||||
captured substrings. It also defines some constants whose
|
||||
names start with "REG_"; these are used for setting options
|
||||
and identifying error codes.
|
||||
|
||||
|
||||
|
||||
COMPILING A PATTERN
|
||||
The function regcomp() is called to compile a pattern into
|
||||
an internal form. The pattern is a C string terminated by a
|
||||
binary zero, and is passed in the argument pattern. The preg
|
||||
argument is a pointer to a regex_t structure which is used
|
||||
as a base for storing information about the compiled expres-
|
||||
sion.
|
||||
|
||||
The argument cflags is either zero, or contains one or more
|
||||
of the bits defined by the following macros:
|
||||
|
||||
REG_ICASE
|
||||
|
||||
The PCRE_CASELESS option is set when the expression is
|
||||
passed for compilation to the native function.
|
||||
|
||||
REG_NEWLINE
|
||||
|
||||
The PCRE_MULTILINE option is set when the expression is
|
||||
passed for compilation to the native function.
|
||||
|
||||
The yield of regcomp() is zero on success, and non-zero oth-
|
||||
erwise. The preg structure is filled in on success, and one
|
||||
member of the structure is publicized: re_nsub contains the
|
||||
number of capturing subpatterns in the regular expression.
|
||||
Various error codes are defined in the header file.
|
||||
|
||||
|
||||
|
||||
MATCHING A PATTERN
|
||||
The function regexec() is called to match a pre-compiled
|
||||
pattern preg against a given string, which is terminated by
|
||||
a zero byte, subject to the options in eflags. These can be:
|
||||
|
||||
REG_NOTBOL
|
||||
|
||||
The PCRE_NOTBOL option is set when calling the underlying
|
||||
PCRE matching function.
|
||||
|
||||
REG_NOTEOL
|
||||
|
||||
The PCRE_NOTEOL option is set when calling the underlying
|
||||
PCRE matching function.
|
||||
|
||||
The portion of the string that was matched, and also any
|
||||
captured substrings, are returned via the pmatch argument,
|
||||
which points to an array of nmatch structures of type
|
||||
regmatch_t, containing the members rm_so and rm_eo. These
|
||||
contain the offset to the first character of each substring
|
||||
and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector
|
||||
relates to the entire portion of string that was matched;
|
||||
subsequent elements relate to the capturing subpatterns of
|
||||
the regular expression. Unused entries in the array have
|
||||
both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes
|
||||
are defined in the header file, of which REG_NOMATCH is the
|
||||
"expected" failure code.
|
||||
|
||||
|
||||
|
||||
ERROR MESSAGES
|
||||
The regerror() function maps a non-zero errorcode from
|
||||
either regcomp or regexec to a printable message. If preg is
|
||||
not NULL, the error should have arisen from the use of that
|
||||
structure. A message terminated by a binary zero is placed
|
||||
in errbuf. The length of the message, including the zero, is
|
||||
limited to errbuf_size. The yield of the function is the
|
||||
size of buffer needed to hold the whole message.
|
||||
|
||||
|
||||
|
||||
STORAGE
|
||||
Compiling a regular expression causes memory to be allocated
|
||||
and associated with the preg structure. The function reg-
|
||||
free() frees all such memory, after which preg may no longer
|
||||
be used as a compiled expression.
|
||||
|
||||
|
||||
|
||||
AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
University Computing Service,
|
||||
New Museums Site,
|
||||
Cambridge CB2 3QG, England.
|
||||
Phone: +44 1223 334714
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
216
ext/pcre/pcrelib/doc/pcretest.txt
Normal file
216
ext/pcre/pcrelib/doc/pcretest.txt
Normal file
@@ -0,0 +1,216 @@
|
||||
The pcretest program
|
||||
--------------------
|
||||
|
||||
This program is intended for testing PCRE, but it can also be used for
|
||||
experimenting with regular expressions.
|
||||
|
||||
If it is given two filename arguments, it reads from the first and writes to
|
||||
the second. If it is given only one filename argument, it reads from that file
|
||||
and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
|
||||
prompts for each line of input, using "re>" to prompt for regular expressions,
|
||||
and "data>" to prompt for data lines.
|
||||
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern. An empty line signals the end of the
|
||||
data lines, at which point a new regular expression is read. The regular
|
||||
expressions are given enclosed in any non-alphameric delimiters other than
|
||||
backslash, for example
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. See the test input files in the testdata directory for many
|
||||
examples. It is possible to include the delimiter within the pattern by
|
||||
escaping it, for example
|
||||
|
||||
/abc\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphameric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
|
||||
/abc/\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
|
||||
/abc\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
|
||||
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
|
||||
example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
These modifier letters have the same effect as they do in Perl. There are
|
||||
others which set PCRE options that do not correspond to anything in Perl: /A,
|
||||
/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
|
||||
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the /g or /G modifier. After finding a match, PCRE is called again to search
|
||||
the remainder of the subject string. The difference between /g and /G is that
|
||||
the former uses the startoffset argument to pcre_exec() to start searching at
|
||||
a new point within the entire string (which is in effect what Perl does),
|
||||
whereas the latter passes over a shortened substring. This makes a difference
|
||||
to the matching process if the pattern begins with a lookbehind assertion
|
||||
(including \b or \B).
|
||||
|
||||
If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
|
||||
next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an
|
||||
empty string again at the same point. If however, this second match fails, the
|
||||
start offset is advanced by one, and the match is retried. This imitates the
|
||||
way Perl handles such cases when using the /g modifier or the split() function.
|
||||
|
||||
There are a number of other modifiers for controlling the way pcretest
|
||||
operates.
|
||||
|
||||
The /+ modifier requests that as well as outputting the substring that matched
|
||||
the entire pattern, pcretest should in addition output the remainder of the
|
||||
subject string. This is useful for tests where the subject contains multiple
|
||||
copies of the same substring.
|
||||
|
||||
The /L modifier must be followed directly by the name of a locale, for example,
|
||||
|
||||
/pattern/Lfr
|
||||
|
||||
For this reason, it must be the last modifier letter. The given locale is set,
|
||||
pcre_maketables() is called to build a set of character tables for the locale,
|
||||
and this is then passed to pcre_compile() when compiling the regular
|
||||
expression. Without an /L modifier, NULL is passed as the tables pointer; that
|
||||
is, /L applies only to the expression on which it appears.
|
||||
|
||||
The /I modifier requests that pcretest output information about the compiled
|
||||
expression (whether it is anchored, has a fixed first character, and so on). It
|
||||
does this by calling pcre_fullinfo() after compiling an expression, and
|
||||
outputting the information it gets back. If the pattern is studied, the results
|
||||
of that are also output.
|
||||
|
||||
The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
|
||||
the internal form of compiled regular expressions to be output after
|
||||
compilation.
|
||||
|
||||
The /S modifier causes pcre_study() to be called after the expression has been
|
||||
compiled, and the results used when the expression is matched.
|
||||
|
||||
The /M modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
|
||||
Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. When this is done, all other modifiers except /i,
|
||||
/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
|
||||
set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
|
||||
and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
Before each data line is passed to pcre_exec(), leading and trailing whitespace
|
||||
is removed, and it is then scanned for \ escapes. The following are recognized:
|
||||
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
|
||||
\A pass the PCRE_ANCHORED option to pcre_exec()
|
||||
\B pass the PCRE_NOTBOL option to pcre_exec()
|
||||
\Cdd call pcre_copy_substring() for substring dd after a successful match
|
||||
(any decimal number less than 32)
|
||||
\Gdd call pcre_get_substring() for substring dd after a successful match
|
||||
(any decimal number less than 32)
|
||||
\L call pcre_get_substringlist() after a successful match
|
||||
\N pass the PCRE_NOTEMPTY option to pcre_exec()
|
||||
\Odd set the size of the output vector passed to pcre_exec() to dd
|
||||
(any number of decimal digits)
|
||||
\Z pass the PCRE_NOTEOL option to pcre_exec()
|
||||
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
|
||||
If /P was present on the regex, causing the POSIX wrapper API to be used, only
|
||||
\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
|
||||
regexec() respectively.
|
||||
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
pcre_exec() returns, starting with number 0 for the string that matched the
|
||||
whole pattern. Here is an example of an interactive pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 2.06 08-Jun-1999
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as \0x
|
||||
escapes. If the pattern has the /+ modifier, then the output for substring 0 is
|
||||
followed by the the rest of the subject string, identified by "0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the /g or /G modifier, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of \C, \G, or \L are present in a data line that is successfully
|
||||
matched, the substrings extracted by the convenience functions are output with
|
||||
C, G, or L after the string number instead of a colon. This is in addition to
|
||||
the normal full list. The string length (that is, the return from the
|
||||
extraction function) is given in parentheses after each string for \C and \G.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape.
|
||||
|
||||
If the -p option is given to pcretest, it is equivalent to adding /P to each
|
||||
regular expression: the POSIX wrapper API is used to call PCRE. None of the
|
||||
following flags has any effect in this case.
|
||||
|
||||
If the option -d is given to pcretest, it is equivalent to adding /D to each
|
||||
regular expression: the internal form is output after compilation.
|
||||
|
||||
If the option -i is given to pcretest, it is equivalent to adding /I to each
|
||||
regular expression: information about the compiled pattern is given after
|
||||
compilation.
|
||||
|
||||
If the option -m is given to pcretest, it outputs the size of each compiled
|
||||
pattern after it has been compiled. It is equivalent to adding /M to each
|
||||
regular expression. For compatibility with earlier versions of pcretest, -s is
|
||||
a synonym for -m.
|
||||
|
||||
If the -t option is given, each compile, study, and match is run 20000 times
|
||||
while being timed, and the resulting time per compile or match is output in
|
||||
milliseconds. Do not set -t with -s, because you will then get the size output
|
||||
20000 times and the timing will be distorted. If you want to change the number
|
||||
of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
|
||||
pcretest.c
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
January 2000
|
||||
23
ext/pcre/pcrelib/doc/perltest.txt
Normal file
23
ext/pcre/pcrelib/doc/perltest.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
The perltest program
|
||||
--------------------
|
||||
|
||||
The perltest program tests Perl's regular expressions; it has the same
|
||||
specification as pcretest, and so can be given identical input, except that
|
||||
input patterns can be followed only by Perl's lower case modifiers and /+ (as
|
||||
used by pcretest), which is recognized and handled by the program.
|
||||
|
||||
The data lines are processed as Perl double-quoted strings, so if they contain
|
||||
" \ $ or @ characters, these have to be escaped. For this reason, all such
|
||||
characters in testinput1 and testinput3 are escaped so that they can be used
|
||||
for perltest as well as for pcretest, and the special upper case modifiers such
|
||||
as /A that pcretest recognizes are not used in these files. The output should
|
||||
be identical, apart from the initial identifying banner.
|
||||
|
||||
The testinput2 and testinput4 files are not suitable for feeding to perltest,
|
||||
since they do make use of the special upper case modifiers and escapes that
|
||||
pcretest uses to test some features of PCRE. The first of these files also
|
||||
contains malformed regular expressions, in order to check that PCRE diagnoses
|
||||
them correctly.
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
January 2000
|
||||
76
ext/pcre/pcrelib/doc/pgrep.1
Normal file
76
ext/pcre/pcrelib/doc/pgrep.1
Normal file
@@ -0,0 +1,76 @@
|
||||
.TH PGREP 1
|
||||
.SH NAME
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pgrep [-Vchilnsvx] pattern [file] ...
|
||||
|
||||
|
||||
.SH DESCRIPTION
|
||||
\fBpgrep\fR searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
\fBpcre(3)\fR for a full description of syntax and semantics.
|
||||
|
||||
If no files are specified, \fBpgrep\fR reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how \fBpgrep\fR behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
|
||||
|
||||
.SH OPTIONS
|
||||
.TP 10
|
||||
\fB-V\fR
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
.TP
|
||||
\fB-c\fR
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
.TP
|
||||
\fB-h\fR
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
.TP
|
||||
\fB-i\fR
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
.TP
|
||||
\fB-l\fR
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
.TP
|
||||
\fB-n\fR
|
||||
Precede each line by its line number in the file.
|
||||
.TP
|
||||
\fB-s\fR
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
.TP
|
||||
\fB-v\fR
|
||||
Invert the sense of the match, so that lines which do \fInot\fR match the
|
||||
pattern are now the ones that are found.
|
||||
.TP
|
||||
\fB-x\fR
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
|
||||
|
||||
.SH SEE ALSO
|
||||
\fBpcre(3)\fR, Perl 5 documentation
|
||||
|
||||
|
||||
.SH DIAGNOSTICS
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
105
ext/pcre/pcrelib/doc/pgrep.html
Normal file
105
ext/pcre/pcrelib/doc/pgrep.html
Normal file
@@ -0,0 +1,105 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>pgrep specification</TITLE>
|
||||
</HEAD>
|
||||
<body bgcolor="#FFFFFF" text="#00005A">
|
||||
<H1>pgrep specification</H1>
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page in case the
|
||||
conversion went wrong.
|
||||
<UL>
|
||||
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
|
||||
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
|
||||
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
|
||||
<LI><A NAME="TOC4" HREF="#SEC4">OPTIONS</A>
|
||||
<LI><A NAME="TOC5" HREF="#SEC5">SEE ALSO</A>
|
||||
<LI><A NAME="TOC6" HREF="#SEC6">DIAGNOSTICS</A>
|
||||
<LI><A NAME="TOC7" HREF="#SEC7">AUTHOR</A>
|
||||
</UL>
|
||||
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
|
||||
<P>
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
</P>
|
||||
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
|
||||
<P>
|
||||
<B>pgrep [-Vchilnsvx] pattern [file] ...</B>
|
||||
</P>
|
||||
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
|
||||
<P>
|
||||
<B>pgrep</B> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<B>pcre(3)</B> for a full description of syntax and semantics.
|
||||
</P>
|
||||
<P>
|
||||
If no files are specified, <B>pgrep</B> reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how <B>pgrep</B> behaves.
|
||||
</P>
|
||||
<P>
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <B><stdio.h></B>.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
</P>
|
||||
<LI><A NAME="SEC4" HREF="#TOC1">OPTIONS</A>
|
||||
<P>
|
||||
<B>-V</B>
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
</P>
|
||||
<P>
|
||||
<B>-c</B>
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
</P>
|
||||
<P>
|
||||
<B>-h</B>
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
</P>
|
||||
<P>
|
||||
<B>-i</B>
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
</P>
|
||||
<P>
|
||||
<B>-l</B>
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
</P>
|
||||
<P>
|
||||
<B>-n</B>
|
||||
Precede each line by its line number in the file.
|
||||
</P>
|
||||
<P>
|
||||
<B>-s</B>
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
</P>
|
||||
<P>
|
||||
<B>-v</B>
|
||||
Invert the sense of the match, so that lines which do <I>not</I> match the
|
||||
pattern are now the ones that are found.
|
||||
</P>
|
||||
<P>
|
||||
<B>-x</B>
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
</P>
|
||||
<LI><A NAME="SEC5" HREF="#TOC1">SEE ALSO</A>
|
||||
<P>
|
||||
<B>pcre(3)</B>, Perl 5 documentation
|
||||
</P>
|
||||
<LI><A NAME="SEC6" HREF="#TOC1">DIAGNOSTICS</A>
|
||||
<P>
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
</P>
|
||||
<LI><A NAME="SEC7" HREF="#TOC1">AUTHOR</A>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<BR>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
86
ext/pcre/pcrelib/doc/pgrep.txt
Normal file
86
ext/pcre/pcrelib/doc/pgrep.txt
Normal file
@@ -0,0 +1,86 @@
|
||||
NAME
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
pgrep [-Vchilnsvx] pattern [file] ...
|
||||
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
pgrep searches files for character patterns, in the same way
|
||||
as other grep commands do, but it uses the PCRE regular
|
||||
expression library to support patterns that are compatible
|
||||
with the regular expressions of Perl 5. See pcre(3) for a
|
||||
full description of syntax and semantics.
|
||||
|
||||
If no files are specified, pgrep reads the standard input.
|
||||
By default, each line that matches the pattern is copied to
|
||||
the standard output, and if there is more than one file, the
|
||||
file name is printed before each line of output. However,
|
||||
there are options that can change how pgrep behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in
|
||||
<stdio.h>. The newline character is removed from the end of
|
||||
each line before it is matched against the pattern.
|
||||
|
||||
|
||||
|
||||
OPTIONS
|
||||
-V Write the version number of the PCRE library being
|
||||
used to the standard error stream.
|
||||
|
||||
-c Do not print individual lines; instead just print
|
||||
a count of the number of lines that would other-
|
||||
wise have been printed. If several files are
|
||||
given, a count is printed for each of them.
|
||||
|
||||
-h Suppress printing of filenames when searching mul-
|
||||
tiple files.
|
||||
|
||||
-i Ignore upper/lower case distinctions during com-
|
||||
parisons.
|
||||
|
||||
-l Instead of printing lines from the files, just
|
||||
print the names of the files containing lines that
|
||||
would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
|
||||
-n Precede each line by its line number in the file.
|
||||
|
||||
-s Work silently, that is, display nothing except
|
||||
error messages. The exit status indicates whether
|
||||
any matches were found.
|
||||
|
||||
-v Invert the sense of the match, so that lines which
|
||||
do not match the pattern are now the ones that are
|
||||
found.
|
||||
|
||||
-x Force the pattern to be anchored (it must start
|
||||
matching at the beginning of the line) and in
|
||||
addition, require it to match the entire line.
|
||||
This is equivalent to having ^ and $ characters at
|
||||
the start and end of each alternative branch in
|
||||
the regular expression.
|
||||
|
||||
|
||||
|
||||
SEE ALSO
|
||||
pcre(3), Perl 5 documentation
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
Exit status is 0 if any matches were found, 1 if no matches
|
||||
were found, and 2 for syntax errors or inacessible files
|
||||
(even if matches were found).
|
||||
|
||||
|
||||
|
||||
AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
|
||||
1902
ext/pcre/pcrelib/testdata/testinput1
vendored
Normal file
1902
ext/pcre/pcrelib/testdata/testinput1
vendored
Normal file
File diff suppressed because it is too large
Load Diff
710
ext/pcre/pcrelib/testdata/testinput2
vendored
Normal file
710
ext/pcre/pcrelib/testdata/testinput2
vendored
Normal file
@@ -0,0 +1,710 @@
|
||||
/(a)b|/
|
||||
|
||||
/abc/
|
||||
abc
|
||||
defabc
|
||||
\Aabc
|
||||
*** Failers
|
||||
\Adefabc
|
||||
ABC
|
||||
|
||||
/^abc/
|
||||
abc
|
||||
\Aabc
|
||||
*** Failers
|
||||
defabc
|
||||
\Adefabc
|
||||
|
||||
/a+bc/
|
||||
|
||||
/a*bc/
|
||||
|
||||
/a{3}bc/
|
||||
|
||||
/(abc|a+z)/
|
||||
|
||||
/^abc$/
|
||||
abc
|
||||
*** Failers
|
||||
def\nabc
|
||||
|
||||
/ab\gdef/X
|
||||
|
||||
/(?X)ab\gdef/X
|
||||
|
||||
/x{5,4}/
|
||||
|
||||
/z{65536}/
|
||||
|
||||
/[abcd/
|
||||
|
||||
/[\B]/
|
||||
|
||||
/[a-\w]/
|
||||
|
||||
/[z-a]/
|
||||
|
||||
/^*/
|
||||
|
||||
/(abc/
|
||||
|
||||
/(?# abc/
|
||||
|
||||
/(?z)abc/
|
||||
|
||||
/.*b/
|
||||
|
||||
/.*?b/
|
||||
|
||||
/cat|dog|elephant/
|
||||
this sentence eventually mentions a cat
|
||||
this sentences rambles on and on for a while and then reaches elephant
|
||||
|
||||
/cat|dog|elephant/S
|
||||
this sentence eventually mentions a cat
|
||||
this sentences rambles on and on for a while and then reaches elephant
|
||||
|
||||
/cat|dog|elephant/iS
|
||||
this sentence eventually mentions a CAT cat
|
||||
this sentences rambles on and on for a while to elephant ElePhant
|
||||
|
||||
/a|[bcd]/S
|
||||
|
||||
/(a|[^\dZ])/S
|
||||
|
||||
/(a|b)*[\s]/S
|
||||
|
||||
/(ab\2)/
|
||||
|
||||
/{4,5}abc/
|
||||
|
||||
/(a)(b)(c)\2/
|
||||
abcb
|
||||
\O0abcb
|
||||
\O3abcb
|
||||
\O6abcb
|
||||
\O9abcb
|
||||
\O12abcb
|
||||
|
||||
/(a)bc|(a)(b)\2/
|
||||
abc
|
||||
\O0abc
|
||||
\O3abc
|
||||
\O6abc
|
||||
aba
|
||||
\O0aba
|
||||
\O3aba
|
||||
\O6aba
|
||||
\O9aba
|
||||
\O12aba
|
||||
|
||||
/abc$/E
|
||||
abc
|
||||
*** Failers
|
||||
abc\n
|
||||
abc\ndef
|
||||
|
||||
/(a)(b)(c)(d)(e)\6/
|
||||
|
||||
/the quick brown fox/
|
||||
the quick brown fox
|
||||
this is a line with the quick brown fox
|
||||
|
||||
/the quick brown fox/A
|
||||
the quick brown fox
|
||||
*** Failers
|
||||
this is a line with the quick brown fox
|
||||
|
||||
/ab(?z)cd/
|
||||
|
||||
/^abc|def/
|
||||
abcdef
|
||||
abcdef\B
|
||||
|
||||
/.*((abc)$|(def))/
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/abc/P
|
||||
abc
|
||||
*** Failers
|
||||
|
||||
/^abc|def/P
|
||||
abcdef
|
||||
abcdef\B
|
||||
|
||||
/.*((abc)$|(def))/P
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/the quick brown fox/P
|
||||
the quick brown fox
|
||||
*** Failers
|
||||
The Quick Brown Fox
|
||||
|
||||
/the quick brown fox/Pi
|
||||
the quick brown fox
|
||||
The Quick Brown Fox
|
||||
|
||||
/abc.def/P
|
||||
*** Failers
|
||||
abc\ndef
|
||||
|
||||
/abc$/P
|
||||
abc
|
||||
abc\n
|
||||
|
||||
/(abc)\2/P
|
||||
|
||||
/(abc\1)/P
|
||||
abc
|
||||
|
||||
/)/
|
||||
|
||||
/a[]b/
|
||||
|
||||
/[^aeiou ]{3,}/
|
||||
co-processors, and for
|
||||
|
||||
/<.*>/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*?>/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*>/U
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*>(?U)/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*?>/U
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/={3,}/U
|
||||
abc========def
|
||||
|
||||
/(?U)={3,}?/
|
||||
abc========def
|
||||
|
||||
/(?<!bar|cattle)foo/
|
||||
foo
|
||||
catfoo
|
||||
*** Failers
|
||||
the barfoo
|
||||
and cattlefoo
|
||||
|
||||
/(?<=a+)b/
|
||||
|
||||
/(?<=aaa|b{0,3})b/
|
||||
|
||||
/(?<!(foo)a\1)bar/
|
||||
|
||||
/(?i)abc/
|
||||
|
||||
/(a|(?m)a)/
|
||||
|
||||
/(?i)^1234/
|
||||
|
||||
/(^b|(?i)^d)/
|
||||
|
||||
/(?s).*/
|
||||
|
||||
/[abcd]/S
|
||||
|
||||
/(?i)[abcd]/S
|
||||
|
||||
/(?m)[xy]|(b|c)/S
|
||||
|
||||
/(^a|^b)/m
|
||||
|
||||
/(?i)(^a|^b)/m
|
||||
|
||||
/(a)(?(1)a|b|c)/
|
||||
|
||||
/(?(?=a)a|b|c)/
|
||||
|
||||
/(?(1a)/
|
||||
|
||||
/(?(?i))/
|
||||
|
||||
/(?(abc))/
|
||||
|
||||
/(?(?<ab))/
|
||||
|
||||
/((?s)blah)\s+\1/
|
||||
|
||||
/((?i)blah)\s+\1/
|
||||
|
||||
/((?i)b)/DS
|
||||
|
||||
/(a*b|(?i:c*(?-i)d))/S
|
||||
|
||||
/a$/
|
||||
a
|
||||
a\n
|
||||
*** Failers
|
||||
\Za
|
||||
\Za\n
|
||||
|
||||
/a$/m
|
||||
a
|
||||
a\n
|
||||
\Za\n
|
||||
*** Failers
|
||||
\Za
|
||||
|
||||
/\Aabc/m
|
||||
|
||||
/^abc/m
|
||||
|
||||
/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/
|
||||
aaaaabbbbbcccccdef
|
||||
|
||||
/(?<=foo)[ab]/S
|
||||
|
||||
/(?<!foo)(alpha|omega)/S
|
||||
|
||||
/(?!alphabet)[ab]/S
|
||||
|
||||
/(?<=foo\n)^bar/m
|
||||
|
||||
/(?>^abc)/m
|
||||
abc
|
||||
def\nabc
|
||||
*** Failers
|
||||
defabc
|
||||
|
||||
/(?<=ab(c+)d)ef/
|
||||
|
||||
/(?<=ab(?<=c+)d)ef/
|
||||
|
||||
/(?<=ab(c|de)f)g/
|
||||
|
||||
/The next three are in testinput2 because they have variable length branches/
|
||||
|
||||
/(?<=bullock|donkey)-cart/
|
||||
the bullock-cart
|
||||
a donkey-cart race
|
||||
*** Failers
|
||||
cart
|
||||
horse-and-cart
|
||||
|
||||
/(?<=ab(?i)x|y|z)/
|
||||
|
||||
/(?>.*)(?<=(abcd)|(xyz))/
|
||||
alphabetabcd
|
||||
endingxyz
|
||||
|
||||
/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/
|
||||
abxyZZ
|
||||
abXyZZ
|
||||
ZZZ
|
||||
zZZ
|
||||
bZZ
|
||||
BZZ
|
||||
*** Failers
|
||||
ZZ
|
||||
abXYZZ
|
||||
zzz
|
||||
bzz
|
||||
|
||||
/(?<!(foo)a)bar/
|
||||
bar
|
||||
foobbar
|
||||
*** Failers
|
||||
fooabar
|
||||
|
||||
/This one is here because Perl 5.005_02 doesn't fail it/
|
||||
|
||||
/^(a)?(?(1)a|b)+$/
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/This one is here because I think Perl 5.005_02 gets the setting of $1 wrong/
|
||||
|
||||
/^(a\1?){4}$/
|
||||
aaaaaa
|
||||
|
||||
/These are syntax tests from Perl 5.005/
|
||||
|
||||
/a[b-a]/
|
||||
|
||||
/a[]b/
|
||||
|
||||
/a[/
|
||||
|
||||
/*a/
|
||||
|
||||
/(*)b/
|
||||
|
||||
/abc)/
|
||||
|
||||
/(abc/
|
||||
|
||||
/a**/
|
||||
|
||||
/)(/
|
||||
|
||||
/\1/
|
||||
|
||||
/\2/
|
||||
|
||||
/(a)|\2/
|
||||
|
||||
/a[b-a]/i
|
||||
|
||||
/a[]b/i
|
||||
|
||||
/a[/i
|
||||
|
||||
/*a/i
|
||||
|
||||
/(*)b/i
|
||||
|
||||
/abc)/i
|
||||
|
||||
/(abc/i
|
||||
|
||||
/a**/i
|
||||
|
||||
/)(/i
|
||||
|
||||
/:(?:/
|
||||
|
||||
/(?<%)b/
|
||||
|
||||
/a(?{)b/
|
||||
|
||||
/a(?{{})b/
|
||||
|
||||
/a(?{}})b/
|
||||
|
||||
/a(?{"{"})b/
|
||||
|
||||
/a(?{"{"}})b/
|
||||
|
||||
/(?(1?)a|b)/
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
|
||||
/[a[:xyz:/
|
||||
|
||||
/(?<=x+)y/
|
||||
|
||||
/a{37,17}/
|
||||
|
||||
/abc/\
|
||||
|
||||
/abc/\P
|
||||
|
||||
/abc/\i
|
||||
|
||||
/(a)bc(d)/
|
||||
abcd
|
||||
abcd\C2
|
||||
abcd\C5
|
||||
|
||||
/(.{20})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1
|
||||
abcdefghijklmnopqrstuvwxyz\G1
|
||||
|
||||
/(.{15})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1\G1
|
||||
|
||||
/(.{16})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1\G1\L
|
||||
|
||||
/^(a|(bc))de(f)/
|
||||
adef\G1\G2\G3\G4\L
|
||||
bcdef\G1\G2\G3\G4\L
|
||||
adefghijk\C0
|
||||
|
||||
/^abc\00def/
|
||||
abc\00def\L\C0
|
||||
|
||||
/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
|
||||
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
|
||||
)?)?)?)?)?)?)?)?)?otherword/M
|
||||
|
||||
/.*X/D
|
||||
|
||||
/.*X/Ds
|
||||
|
||||
/(.*X|^B)/D
|
||||
|
||||
/(.*X|^B)/Ds
|
||||
|
||||
/(?s)(.*X|^B)/D
|
||||
|
||||
/(?s:.*X|^B)/D
|
||||
|
||||
/\Biss\B/+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/+P
|
||||
Mississippi
|
||||
|
||||
/iss/G+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/G+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/g+
|
||||
Mississippi
|
||||
*** Failers
|
||||
Mississippi\A
|
||||
|
||||
/(?<=[Ms])iss/g+
|
||||
Mississippi
|
||||
|
||||
/(?<=[Ms])iss/G+
|
||||
Mississippi
|
||||
|
||||
/^iss/g+
|
||||
ississippi
|
||||
|
||||
/.*iss/g+
|
||||
abciss\nxyzisspqr
|
||||
|
||||
/.i./+g
|
||||
Mississippi
|
||||
Mississippi\A
|
||||
Missouri river
|
||||
Missouri river\A
|
||||
|
||||
/^.is/+g
|
||||
Mississippi
|
||||
|
||||
/^ab\n/g+
|
||||
ab\nab\ncd
|
||||
|
||||
/^ab\n/mg+
|
||||
ab\nab\ncd
|
||||
|
||||
/abc/
|
||||
|
||||
/abc|bac/
|
||||
|
||||
/(abc|bac)/
|
||||
|
||||
/(abc|(c|dc))/
|
||||
|
||||
/(abc|(d|de)c)/
|
||||
|
||||
/a*/
|
||||
|
||||
/a+/
|
||||
|
||||
/(baa|a+)/
|
||||
|
||||
/a{0,3}/
|
||||
|
||||
/baa{3,}/
|
||||
|
||||
/"([^\\"]+|\\.)*"/
|
||||
|
||||
/(abc|ab[cd])/
|
||||
|
||||
/(a|.)/
|
||||
|
||||
/a|ba|\w/
|
||||
|
||||
/abc(?=pqr)/
|
||||
|
||||
/...(?<=abc)/
|
||||
|
||||
/abc(?!pqr)/
|
||||
|
||||
/ab./
|
||||
|
||||
/ab[xyz]/
|
||||
|
||||
/abc*/
|
||||
|
||||
/ab.c*/
|
||||
|
||||
/a.c*/
|
||||
|
||||
/.c*/
|
||||
|
||||
/ac*/
|
||||
|
||||
/(a.c*|b.c*)/
|
||||
|
||||
/a.c*|aba/
|
||||
|
||||
/.+a/
|
||||
|
||||
/(?=abcda)a.*/
|
||||
|
||||
/(?=a)a.*/
|
||||
|
||||
/a(b)*/
|
||||
|
||||
/a\d*/
|
||||
|
||||
/ab\d*/
|
||||
|
||||
/a(\d)*/
|
||||
|
||||
/abcde{0,0}/
|
||||
|
||||
/ab\d+/
|
||||
|
||||
/a(?(1)b)/
|
||||
|
||||
/a(?(1)bag|big)/
|
||||
|
||||
/a(?(1)bag|big)*/
|
||||
|
||||
/a(?(1)bag|big)+/
|
||||
|
||||
/a(?(1)b..|b..)/
|
||||
|
||||
/ab\d{0}e/
|
||||
|
||||
/a?b?/
|
||||
a
|
||||
b
|
||||
ab
|
||||
\
|
||||
*** Failers
|
||||
\N
|
||||
|
||||
/|-/
|
||||
abcd
|
||||
-abc
|
||||
\Nab-c
|
||||
*** Failers
|
||||
\Nabc
|
||||
|
||||
/a*(b+)(z)(z)/P
|
||||
aaaabbbbzzzz
|
||||
aaaabbbbzzzz\O0
|
||||
aaaabbbbzzzz\O1
|
||||
aaaabbbbzzzz\O2
|
||||
aaaabbbbzzzz\O3
|
||||
aaaabbbbzzzz\O4
|
||||
aaaabbbbzzzz\O5
|
||||
|
||||
/^.?abcd/S
|
||||
|
||||
/\( # ( at start
|
||||
(?: # Non-capturing bracket
|
||||
(?>[^()]+) # Either a sequence of non-brackets (no backtracking)
|
||||
| # Or
|
||||
(?R) # Recurse - i.e. nested bracketed string
|
||||
)* # Zero or more contents
|
||||
\) # Closing )
|
||||
/x
|
||||
(abcd)
|
||||
(abcd)xyz
|
||||
xyz(abcd)
|
||||
(ab(xy)cd)pqr
|
||||
(ab(xycd)pqr
|
||||
() abc ()
|
||||
12(abcde(fsh)xyz(foo(bar))lmno)89
|
||||
*** Failers
|
||||
abcd
|
||||
abcd)
|
||||
(abcd
|
||||
|
||||
/\( ( (?>[^()]+) | (?R) )* \) /xg
|
||||
(ab(xy)cd)pqr
|
||||
1(abcd)(x(y)z)pqr
|
||||
|
||||
/\( (?: (?>[^()]+) | (?R) ) \) /x
|
||||
(abcd)
|
||||
(ab(xy)cd)
|
||||
(a(b(c)d)e)
|
||||
((ab))
|
||||
*** Failers
|
||||
()
|
||||
|
||||
/\( (?: (?>[^()]+) | (?R) )? \) /x
|
||||
()
|
||||
12(abcde(fsh)xyz(foo(bar))lmno)89
|
||||
|
||||
/\( ( (?>[^()]+) | (?R) )* \) /x
|
||||
(ab(xy)cd)
|
||||
|
||||
/\( ( ( (?>[^()]+) | (?R) )* ) \) /x
|
||||
(ab(xy)cd)
|
||||
|
||||
/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /x
|
||||
(ab(xy)cd)
|
||||
(123ab(xy)cd)
|
||||
|
||||
/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /x
|
||||
(ab(xy)cd)
|
||||
(123ab(xy)cd)
|
||||
|
||||
/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /x
|
||||
(ab(xy)cd)
|
||||
|
||||
/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /x
|
||||
(abcd(xyz<p>qrs)123)
|
||||
|
||||
/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /x
|
||||
(ab(cd)ef)
|
||||
(ab(cd(ef)gh)ij)
|
||||
|
||||
/^[[:alnum:]]/D
|
||||
|
||||
/^[[:alpha:]]/D
|
||||
|
||||
/^[[:ascii:]]/D
|
||||
|
||||
/^[[:cntrl:]]/D
|
||||
|
||||
/^[[:digit:]]/D
|
||||
|
||||
/^[[:graph:]]/D
|
||||
|
||||
/^[[:lower:]]/D
|
||||
|
||||
/^[[:print:]]/D
|
||||
|
||||
/^[[:punct:]]/D
|
||||
|
||||
/^[[:space:]]/D
|
||||
|
||||
/^[[:upper:]]/D
|
||||
|
||||
/^[[:xdigit:]]/D
|
||||
|
||||
/^[[:word:]]/D
|
||||
|
||||
/^[[:^cntrl:]]/D
|
||||
|
||||
/^[12[:^digit:]]/D
|
||||
|
||||
/[01[:alpha:]%]/D
|
||||
|
||||
/[[.ch.]]/
|
||||
|
||||
/[[=ch=]]/
|
||||
|
||||
/[[:rhubarb:]]/
|
||||
|
||||
/[[:upper:]]/i
|
||||
A
|
||||
a
|
||||
|
||||
/[[:lower:]]/i
|
||||
A
|
||||
a
|
||||
|
||||
/((?-i)[[:lower:]])[[:lower:]]/i
|
||||
ab
|
||||
aB
|
||||
*** Failers
|
||||
Ab
|
||||
AB
|
||||
|
||||
/ End of test input /
|
||||
1692
ext/pcre/pcrelib/testdata/testinput3
vendored
Normal file
1692
ext/pcre/pcrelib/testdata/testinput3
vendored
Normal file
File diff suppressed because it is too large
Load Diff
64
ext/pcre/pcrelib/testdata/testinput4
vendored
Normal file
64
ext/pcre/pcrelib/testdata/testinput4
vendored
Normal file
@@ -0,0 +1,64 @@
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\w]+/Lfr
|
||||
École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
|
||||
/^[\W]+/Lfr
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/[\b]/Lfr
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^\w+/Lfr
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/Lfr
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/École/i
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
||||
/École/iLfr
|
||||
École
|
||||
école
|
||||
|
||||
/\w/IS
|
||||
|
||||
/\w/ISLfr
|
||||
|
||||
/^[\xc8-\xc9]/iLfr
|
||||
École
|
||||
école
|
||||
|
||||
/^[\xc8-\xc9]/Lfr
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
||||
2925
ext/pcre/pcrelib/testdata/testoutput1
vendored
Normal file
2925
ext/pcre/pcrelib/testdata/testoutput1
vendored
Normal file
File diff suppressed because it is too large
Load Diff
2072
ext/pcre/pcrelib/testdata/testoutput2
vendored
Normal file
2072
ext/pcre/pcrelib/testdata/testoutput2
vendored
Normal file
File diff suppressed because it is too large
Load Diff
2929
ext/pcre/pcrelib/testdata/testoutput3
vendored
Normal file
2929
ext/pcre/pcrelib/testdata/testoutput3
vendored
Normal file
File diff suppressed because it is too large
Load Diff
115
ext/pcre/pcrelib/testdata/testoutput4
vendored
Normal file
115
ext/pcre/pcrelib/testdata/testoutput4
vendored
Normal file
@@ -0,0 +1,115 @@
|
||||
PCRE version 3.1 09-Feb-2000
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\w]+/Lfr
|
||||
École
|
||||
0: École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
0: \xc9
|
||||
|
||||
/^[\W]+/Lfr
|
||||
*** Failers
|
||||
0: ***
|
||||
École
|
||||
No match
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/[\b]/Lfr
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^\w+/Lfr
|
||||
École
|
||||
0: École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
0: \xc9cole
|
||||
1: \xc9
|
||||
2: cole
|
||||
|
||||
/(.+)\b(.+)/Lfr
|
||||
*** Failers
|
||||
0: *** Failers
|
||||
1: ***
|
||||
2: Failers
|
||||
École
|
||||
No match
|
||||
|
||||
/École/i
|
||||
École
|
||||
0: \xc9cole
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
/École/iLfr
|
||||
École
|
||||
0: École
|
||||
école
|
||||
0: école
|
||||
|
||||
/\w/IS
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
||||
/\w/ISLfr
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ğ Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü İ Ş ß à á â ã ä å
|
||||
æ ç è é ê ë ì í î ï ğ ñ ò ó ô õ ö ø ù ú û ü ı ş ÿ
|
||||
|
||||
/^[\xc8-\xc9]/iLfr
|
||||
École
|
||||
0: É
|
||||
école
|
||||
0: é
|
||||
|
||||
/^[\xc8-\xc9]/Lfr
|
||||
École
|
||||
0: É
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
|
||||
Reference in New Issue
Block a user