mirror of
https://github.com/php/php-src.git
synced 2026-04-24 16:38:25 +02:00
upgrade PCRE to version 7.5 (as asked by Ilia
This commit is contained in:
@@ -6,6 +6,7 @@ PHP NEWS
|
||||
(Ilia)
|
||||
- Fixed a bug with PDO::FETCH_COLUMN|PDO::FETCH_GROUP mode when a column # by
|
||||
which to group by data is specified. (Ilia)
|
||||
- Upgraded PCRE to version 7.5 (Nuno)
|
||||
|
||||
- Fixed faulty fix for bug #40189
|
||||
(endless loop in zlib.inflate stream filter) (Greg)
|
||||
|
||||
@@ -1,6 +1,218 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
Version 7.5 10-Jan-08
|
||||
---------------------
|
||||
|
||||
1. Applied a patch from Craig: "This patch makes it possible to 'ignore'
|
||||
values in parens when parsing an RE using the C++ wrapper."
|
||||
|
||||
2. Negative specials like \S did not work in character classes in UTF-8 mode.
|
||||
Characters greater than 255 were excluded from the class instead of being
|
||||
included.
|
||||
|
||||
3. The same bug as (2) above applied to negated POSIX classes such as
|
||||
[:^space:].
|
||||
|
||||
4. PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it
|
||||
defined or documented. It seems to have been a typo for PCRE_STATIC, so
|
||||
I have changed it.
|
||||
|
||||
5. The construct (?&) was not diagnosed as a syntax error (it referenced the
|
||||
first named subpattern) and a construct such as (?&a) would reference the
|
||||
first named subpattern whose name started with "a" (in other words, the
|
||||
length check was missing). Both these problems are fixed. "Subpattern name
|
||||
expected" is now given for (?&) (a zero-length name), and this patch also
|
||||
makes it give the same error for \k'' (previously it complained that that
|
||||
was a reference to a non-existent subpattern).
|
||||
|
||||
6. The erroneous patterns (?+-a) and (?-+a) give different error messages;
|
||||
this is right because (?- can be followed by option settings as well as by
|
||||
digits. I have, however, made the messages clearer.
|
||||
|
||||
7. Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
|
||||
than the number used in the conditional) now cause a compile-time error.
|
||||
This is actually not compatible with Perl, which accepts such patterns, but
|
||||
treats the conditional as always being FALSE (as PCRE used to), but it
|
||||
seems to me that giving a diagnostic is better.
|
||||
|
||||
8. Change "alphameric" to the more common word "alphanumeric" in comments
|
||||
and messages.
|
||||
|
||||
9. Fix two occurrences of "backslash" in comments that should have been
|
||||
"backspace".
|
||||
|
||||
10. Remove two redundant lines of code that can never be obeyed (their function
|
||||
was moved elsewhere).
|
||||
|
||||
11. The program that makes PCRE's Unicode character property table had a bug
|
||||
which caused it to generate incorrect table entries for sequences of
|
||||
characters that have the same character type, but are in different scripts.
|
||||
It amalgamated them into a single range, with the script of the first of
|
||||
them. In other words, some characters were in the wrong script. There were
|
||||
thirteen such cases, affecting characters in the following ranges:
|
||||
|
||||
U+002b0 - U+002c1
|
||||
U+0060c - U+0060d
|
||||
U+0061e - U+00612
|
||||
U+0064b - U+0065e
|
||||
U+0074d - U+0076d
|
||||
U+01800 - U+01805
|
||||
U+01d00 - U+01d77
|
||||
U+01d9b - U+01dbf
|
||||
U+0200b - U+0200f
|
||||
U+030fc - U+030fe
|
||||
U+03260 - U+0327f
|
||||
U+0fb46 - U+0fbb1
|
||||
U+10450 - U+1049d
|
||||
|
||||
12. The -o option (show only the matching part of a line) for pcregrep was not
|
||||
compatible with GNU grep in that, if there was more than one match in a
|
||||
line, it showed only the first of them. It now behaves in the same way as
|
||||
GNU grep.
|
||||
|
||||
13. If the -o and -v options were combined for pcregrep, it printed a blank
|
||||
line for every non-matching line. GNU grep prints nothing, and pcregrep now
|
||||
does the same. The return code can be used to tell if there were any
|
||||
non-matching lines.
|
||||
|
||||
14. Added --file-offsets and --line-offsets to pcregrep.
|
||||
|
||||
15. The pattern (?=something)(?R) was not being diagnosed as a potentially
|
||||
infinitely looping recursion. The bug was that positive lookaheads were not
|
||||
being skipped when checking for a possible empty match (negative lookaheads
|
||||
and both kinds of lookbehind were skipped).
|
||||
|
||||
16. Fixed two typos in the Windows-only code in pcregrep.c, and moved the
|
||||
inclusion of <windows.h> to before rather than after the definition of
|
||||
INVALID_FILE_ATTRIBUTES (patch from David Byron).
|
||||
|
||||
17. Specifying a possessive quantifier with a specific limit for a Unicode
|
||||
character property caused pcre_compile() to compile bad code, which led at
|
||||
runtime to PCRE_ERROR_INTERNAL (-14). Examples of patterns that caused this
|
||||
are: /\p{Zl}{2,3}+/8 and /\p{Cc}{2}+/8. It was the possessive "+" that
|
||||
caused the error; without that there was no problem.
|
||||
|
||||
18. Added --enable-pcregrep-libz and --enable-pcregrep-libbz2.
|
||||
|
||||
19. Added --enable-pcretest-libreadline.
|
||||
|
||||
20. In pcrecpp.cc, the variable 'count' was incremented twice in
|
||||
RE::GlobalReplace(). As a result, the number of replacements returned was
|
||||
double what it should be. I removed one of the increments, but Craig sent a
|
||||
later patch that removed the other one (the right fix) and added unit tests
|
||||
that check the return values (which was not done before).
|
||||
|
||||
21. Several CMake things:
|
||||
|
||||
(1) Arranged that, when cmake is used on Unix, the libraries end up with
|
||||
the names libpcre and libpcreposix, not just pcre and pcreposix.
|
||||
|
||||
(2) The above change means that pcretest and pcregrep are now correctly
|
||||
linked with the newly-built libraries, not previously installed ones.
|
||||
|
||||
(3) Added PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, PCRE_SUPPORT_LIBBZ2.
|
||||
|
||||
22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
|
||||
crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
|
||||
UTF-8 newline character). The key issue is that the pattern starts .*;
|
||||
this means that the match must be either at the beginning, or after a
|
||||
newline. The bug was in the code for advancing after a failed match and
|
||||
checking that the new position followed a newline. It was not taking
|
||||
account of UTF-8 characters correctly.
|
||||
|
||||
23. PCRE was behaving differently from Perl in the way it recognized POSIX
|
||||
character classes. PCRE was not treating the sequence [:...:] as a
|
||||
character class unless the ... were all letters. Perl, however, seems to
|
||||
allow any characters between [: and :], though of course it rejects as
|
||||
unknown any "names" that contain non-letters, because all the known class
|
||||
names consist only of letters. Thus, Perl gives an error for [[:1234:]],
|
||||
for example, whereas PCRE did not - it did not recognize a POSIX character
|
||||
class. This seemed a bit dangerous, so the code has been changed to be
|
||||
closer to Perl. The behaviour is not identical to Perl, because PCRE will
|
||||
diagnose an unknown class for, for example, [[:l\ower:]] where Perl will
|
||||
treat it as [[:lower:]]. However, PCRE does now give "unknown" errors where
|
||||
Perl does, and where it didn't before.
|
||||
|
||||
24. Rewrite so as to remove the single use of %n from pcregrep because in some
|
||||
Windows environments %n is disabled by default.
|
||||
|
||||
|
||||
Version 7.4 21-Sep-07
|
||||
---------------------
|
||||
|
||||
1. Change 7.3/28 was implemented for classes by looking at the bitmap. This
|
||||
means that a class such as [\s] counted as "explicit reference to CR or
|
||||
LF". That isn't really right - the whole point of the change was to try to
|
||||
help when there was an actual mention of one of the two characters. So now
|
||||
the change happens only if \r or \n (or a literal CR or LF) character is
|
||||
encountered.
|
||||
|
||||
2. The 32-bit options word was also used for 6 internal flags, but the numbers
|
||||
of both had grown to the point where there were only 3 bits left.
|
||||
Fortunately, there was spare space in the data structure, and so I have
|
||||
moved the internal flags into a new 16-bit field to free up more option
|
||||
bits.
|
||||
|
||||
3. The appearance of (?J) at the start of a pattern set the DUPNAMES option,
|
||||
but did not set the internal JCHANGED flag - either of these is enough to
|
||||
control the way the "get" function works - but the PCRE_INFO_JCHANGED
|
||||
facility is supposed to tell if (?J) was ever used, so now (?J) at the
|
||||
start sets both bits.
|
||||
|
||||
4. Added options (at build time, compile time, exec time) to change \R from
|
||||
matching any Unicode line ending sequence to just matching CR, LF, or CRLF.
|
||||
|
||||
5. doc/pcresyntax.html was missing from the distribution.
|
||||
|
||||
6. Put back the definition of PCRE_ERROR_NULLWSLIMIT, for backward
|
||||
compatibility, even though it is no longer used.
|
||||
|
||||
7. Added macro for snprintf to pcrecpp_unittest.cc and also for strtoll and
|
||||
strtoull to pcrecpp.cc to select the available functions in WIN32 when the
|
||||
windows.h file is present (where different names are used). [This was
|
||||
reversed later after testing - see 16 below.]
|
||||
|
||||
8. Changed all #include <config.h> to #include "config.h". There were also
|
||||
some further <pcre.h> cases that I changed to "pcre.h".
|
||||
|
||||
9. When pcregrep was used with the --colour option, it missed the line ending
|
||||
sequence off the lines that it output.
|
||||
|
||||
10. It was pointed out to me that arrays of string pointers cause lots of
|
||||
relocations when a shared library is dynamically loaded. A technique of
|
||||
using a single long string with a table of offsets can drastically reduce
|
||||
these. I have refactored PCRE in four places to do this. The result is
|
||||
dramatic:
|
||||
|
||||
Originally: 290
|
||||
After changing UCP table: 187
|
||||
After changing error message table: 43
|
||||
After changing table of "verbs" 36
|
||||
After changing table of Posix names 22
|
||||
|
||||
Thanks to the folks working on Gregex for glib for this insight.
|
||||
|
||||
11. --disable-stack-for-recursion caused compiling to fail unless -enable-
|
||||
unicode-properties was also set.
|
||||
|
||||
12. Updated the tests so that they work when \R is defaulted to ANYCRLF.
|
||||
|
||||
13. Added checks for ANY and ANYCRLF to pcrecpp.cc where it previously
|
||||
checked only for CRLF.
|
||||
|
||||
14. Added casts to pcretest.c to avoid compiler warnings.
|
||||
|
||||
15. Added Craig's patch to various pcrecpp modules to avoid compiler warnings.
|
||||
|
||||
16. Added Craig's patch to remove the WINDOWS_H tests, that were not working,
|
||||
and instead check for _strtoi64 explicitly, and avoid the use of snprintf()
|
||||
entirely. This removes changes made in 7 above.
|
||||
|
||||
17. The CMake files have been updated, and there is now more information about
|
||||
building with CMake in the NON-UNIX-USE document.
|
||||
|
||||
|
||||
Version 7.3 28-Aug-07
|
||||
---------------------
|
||||
|
||||
|
||||
@@ -1,6 +1,25 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 7.5 10-Jan-08
|
||||
---------------------
|
||||
|
||||
This is mainly a bug-fix release. However the ability to link pcregrep with
|
||||
libz or libbz2 and the ability to link pcretest with libreadline have been
|
||||
added. Also the --line-offsets and --file-offsets options were added to
|
||||
pcregrep.
|
||||
|
||||
|
||||
Release 7.4 21-Sep-07
|
||||
---------------------
|
||||
|
||||
The only change of specification is the addition of options to control whether
|
||||
\R matches any Unicode line ending (the default) or just CR, LF, and CRLF.
|
||||
Otherwise, the changes are bug fixes and a refactoring to reduce the number of
|
||||
relocations needed in a shared library. There have also been some documentation
|
||||
updates, in particular, some more information about using CMake to build PCRE
|
||||
has been added to the NON-UNIX-USE file.
|
||||
|
||||
|
||||
Release 7.3 28-Aug-07
|
||||
---------------------
|
||||
|
||||
@@ -9,6 +9,7 @@ This document contains the following sections:
|
||||
Building for virtual Pascal
|
||||
Stack size in Windows environments
|
||||
Comments about Win32 builds
|
||||
Building PCRE with CMake
|
||||
Building under Windows with BCC5.5
|
||||
Building PCRE on OpenVMS
|
||||
|
||||
@@ -30,9 +31,10 @@ library consists entirely of code written in Standard C, and so should compile
|
||||
successfully on any system that has a Standard C compiler and library. The C++
|
||||
wrapper functions are a separate issue (see below).
|
||||
|
||||
The PCRE distribution contains some experimental support for "cmake", but this
|
||||
is incomplete and not documented. However if you are a "cmake" user you might
|
||||
like to try building with "cmake".
|
||||
The PCRE distribution includes support for CMake. This support is relatively
|
||||
new, but has already been used successfully to build PCRE in multiple build
|
||||
environments on Windows. There are some instructions in the section entitled
|
||||
"Building PCRE with CMake" below.
|
||||
|
||||
|
||||
GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
|
||||
@@ -42,10 +44,13 @@ The following are generic comments about building the PCRE C library "by hand".
|
||||
(1) Copy or rename the file config.h.generic as config.h, and edit the macro
|
||||
settings that it contains to whatever is appropriate for your environment.
|
||||
In particular, if you want to force a specific value for newline, you can
|
||||
define the NEWLINE macro.
|
||||
define the NEWLINE macro. When you compile any of the PCRE modules, you
|
||||
must specify -DHAVE_CONFIG_H to your compiler so that config.h is included
|
||||
in the sources.
|
||||
|
||||
An alternative approach is not to edit config.h, but to use -D on the
|
||||
compiler command line to make any changes that you need.
|
||||
compiler command line to make any changes that you need to the
|
||||
configuration options. In this case -DHAVE_CONFIG_H must not be set.
|
||||
|
||||
NOTE: There have been occasions when the way in which certain parameters
|
||||
in config.h are used has changed between releases. (In the configure/make
|
||||
@@ -59,13 +64,14 @@ The following are generic comments about building the PCRE C library "by hand".
|
||||
Copy or rename file pcre_chartables.c.dist as pcre_chartables.c.
|
||||
|
||||
OR:
|
||||
Compile dftables.c as a stand-alone program, and then run it with the
|
||||
single argument "pcre_chartables.c". This generates a set of standard
|
||||
character tables and writes them to that file. The tables are generated
|
||||
using the default C locale for your system. If you want to use a locale
|
||||
that is specified by LC_xxx environment variables, add the -L option to
|
||||
the dftables command. You must use this method if you are building on
|
||||
a system that uses EBCDIC code.
|
||||
Compile dftables.c as a stand-alone program (using -DHAVE_CONFIG_H if
|
||||
you have set up config.h), and then run it with the single argument
|
||||
"pcre_chartables.c". This generates a set of standard character tables
|
||||
and writes them to that file. The tables are generated using the default
|
||||
C locale for your system. If you want to use a locale that is specified
|
||||
by LC_xxx environment variables, add the -L option to the dftables
|
||||
command. You must use this method if you are building on a system that
|
||||
uses EBCDIC code.
|
||||
|
||||
The tables in pcre_chartables.c are defaults. The caller of PCRE can
|
||||
specify alternative tables at run time.
|
||||
@@ -78,11 +84,13 @@ The following are generic comments about building the PCRE C library "by hand".
|
||||
ucptable.h
|
||||
|
||||
(5) Also ensure that you have the following file, which is #included as source
|
||||
when building a debugging version of PCRE and is also used by pcretest.
|
||||
when building a debugging version of PCRE, and is also used by pcretest.
|
||||
|
||||
pcre_printint.src
|
||||
|
||||
(6) Compile the following source files:
|
||||
(6) Compile the following source files, setting -DHAVE_CONFIG_H as a compiler
|
||||
option if you have set up config.h with your configuration, or else use
|
||||
other -D settings to change the configuration as required.
|
||||
|
||||
pcre_chartables.c
|
||||
pcre_compile.c
|
||||
@@ -115,18 +123,21 @@ The following are generic comments about building the PCRE C library "by hand".
|
||||
your system has static and shared libraries, you may have to do this once
|
||||
for each type.
|
||||
|
||||
(8) Similarly, compile pcreposix.c and link the result (on its own) as the
|
||||
pcreposix library.
|
||||
(8) Similarly, compile pcreposix.c (remembering -DHAVE_CONFIG_H if necessary)
|
||||
and link the result (on its own) as the pcreposix library.
|
||||
|
||||
(9) Compile the test program pcretest.c. This needs the functions in the
|
||||
pcre and pcreposix libraries when linking. It also needs the
|
||||
pcre_printint.src source file, which it #includes.
|
||||
(9) Compile the test program pcretest.c (again, don't forget -DHAVE_CONFIG_H).
|
||||
This needs the functions in the pcre and pcreposix libraries when linking.
|
||||
It also needs the pcre_printint.src source file, which it #includes.
|
||||
|
||||
(10) Run pcretest on the testinput files in the testdata directory, and check
|
||||
that the output matches the corresponding testoutput files. Note that the
|
||||
supplied files are in Unix format, with just LF characters as line
|
||||
terminators. You may need to edit them to change this if your system uses
|
||||
a different convention.
|
||||
a different convention. If you are using Windows, you probably should use
|
||||
the wintestinput3 file instead of testinput3 (and the corresponding output
|
||||
file). This is a locale test; wintestinput3 sets the locale to "french"
|
||||
rather than "fr_FR", and there some minor output differences.
|
||||
|
||||
(11) If you want to use the pcregrep command, compile and link pcregrep.c; it
|
||||
uses only the basic PCRE library (it does not need the pcreposix library).
|
||||
@@ -158,11 +169,15 @@ fail because of this. Normally, running out of stack causes a crash, but there
|
||||
have been cases where the test program has just died silently. See your linker
|
||||
documentation for how to increase stack size if you experience problems. The
|
||||
Linux default of 8Mb is a reasonable choice for the stack, though even that can
|
||||
be too small for some pattern/subject combinations. There is more about stack
|
||||
usage in the "pcrestack" documentation.
|
||||
be too small for some pattern/subject combinations.
|
||||
|
||||
PCRE has a compile configuration option to disable the use of stack for
|
||||
recursion so that heap is used instead. However, pattern matching is
|
||||
significantly slower when this is done. There is more about stack usage in the
|
||||
"pcrestack" documentation.
|
||||
|
||||
|
||||
COMMENTS ABOUT WIN32 BUILDS
|
||||
COMMENTS ABOUT WIN32 BUILDS (see also "BUILDING PCRE WITH CMAKE" below)
|
||||
|
||||
There are two ways of building PCRE using the "configure, make, make install"
|
||||
paradigm on Windows systems: using MinGW or using Cygwin. These are not at all
|
||||
@@ -237,6 +252,60 @@ terminators in order to get some of the tests to work. We hope to improve
|
||||
things in this area in future.
|
||||
|
||||
|
||||
BUILDING PCRE WITH CMAKE
|
||||
|
||||
CMake is an alternative build facility that can be used instead of the
|
||||
traditional Unix "configure". CMake version 2.4.7 supports Borland makefiles,
|
||||
MinGW makefiles, MSYS makefiles, NMake makefiles, UNIX makefiles, Visual Studio
|
||||
6, Visual Studio 7, Visual Studio 8, and Watcom W8. The following instructions
|
||||
were contributed by a PCRE user.
|
||||
|
||||
1. Download CMake 2.4.7 or above from http://www.cmake.org/, install and ensure
|
||||
that cmake\bin is on your path.
|
||||
|
||||
2. Unzip (retaining folder structure) the PCRE source tree into a source
|
||||
directory such as C:\pcre.
|
||||
|
||||
3. Create a new, empty build directory: C:\pcre\build\
|
||||
|
||||
4. Run CMakeSetup from the Shell envirornment of your build tool, e.g., Msys
|
||||
for Msys/MinGW or Visual Studio Command Prompt for VC/VC++
|
||||
|
||||
5. Enter C:\pcre\pcre-xx and C:\pcre\build for the source and build
|
||||
directories, respectively
|
||||
|
||||
6. Hit the "Configure" button.
|
||||
|
||||
7. Select the particular IDE / build tool that you are using (Visual Studio,
|
||||
MSYS makefiles, MinGW makefiles, etc.)
|
||||
|
||||
8. The GUI will then list several configuration options. This is where you can
|
||||
enable UTF-8 support, etc.
|
||||
|
||||
9. Hit "Configure" again. The adjacent "OK" button should now be active.
|
||||
|
||||
10. Hit "OK".
|
||||
|
||||
11. The build directory should now contain a usable build system, be it a
|
||||
solution file for Visual Studio, makefiles for MinGW, etc.
|
||||
|
||||
Testing with RunTest.bat
|
||||
|
||||
1. Copy RunTest.bat into the directory where pcretest.exe has been created.
|
||||
|
||||
2. Edit RunTest.bat and insert a line that indentifies the relative location of
|
||||
the pcre source, e.g.:
|
||||
|
||||
set srcdir=..\pcre-7.4-RC3
|
||||
|
||||
3. Run RunTest.bat from a command shell environment. Test outputs will
|
||||
automatically be compared to expected results, and discrepancies will
|
||||
identified in the console output.
|
||||
|
||||
4. To test pcrecpp, run pcrecpp_unittest.exe, pcre_stringpiece_unittest.exe and
|
||||
pcre_scanner_unittest.exe.
|
||||
|
||||
|
||||
BUILDING UNDER WINDOWS WITH BCC5.5
|
||||
|
||||
Michael Roy sent these comments about building PCRE under Windows with BCC5.5:
|
||||
@@ -315,5 +384,5 @@ $! Locale could not be set to fr
|
||||
$!
|
||||
=========================
|
||||
|
||||
Last Updated: 01 August 2007
|
||||
Last Updated: 21 September 2007
|
||||
****
|
||||
|
||||
+32
-3
@@ -103,7 +103,9 @@ Building PCRE on non-Unix systems
|
||||
|
||||
For a non-Unix system, please read the comments in the file NON-UNIX-USE,
|
||||
though if your system supports the use of "configure" and "make" you may be
|
||||
able to build PCRE in the same way as for Unix-like systems.
|
||||
able to build PCRE in the same way as for Unix-like systems. PCRE can also be
|
||||
configured in many platform environments using the GUI facility of CMake's
|
||||
CMakeSetup. It creates Makefiles, solution files, etc.
|
||||
|
||||
PCRE has been compiled on many different operating systems. It should be
|
||||
straightforward to build PCRE on any system that has a Standard C compiler and
|
||||
@@ -184,6 +186,12 @@ library. You can read more about them in the pcrebuild man page.
|
||||
--enable-newline-is-any, many tests should succeed, but there may be some
|
||||
failures.
|
||||
|
||||
. By default, the sequence \R in a pattern matches any Unicode line ending
|
||||
sequence. This is independent of the option specifying what PCRE considers to
|
||||
be the end of a line (see above). However, the caller of PCRE can restrict \R
|
||||
to match only CR, LF, or CRLF. You can make this the default by adding
|
||||
--enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
|
||||
|
||||
. When called via the POSIX interface, PCRE uses malloc() to get additional
|
||||
storage for processing capturing parentheses if there are more than 10 of
|
||||
them in a pattern. You can increase this threshold by setting, for example,
|
||||
@@ -250,6 +258,24 @@ library. You can read more about them in the pcrebuild man page.
|
||||
|
||||
This automatically implies --enable-rebuild-chartables (see above).
|
||||
|
||||
. It is possible to compile pcregrep to use libz and/or libbz2, in order to
|
||||
read .gz and .bz2 files (respectively), by specifying one or both of
|
||||
|
||||
--enable-pcregrep-libz
|
||||
--enable-pcregrep-libbz2
|
||||
|
||||
Of course, the relevant libraries must be installed on your system.
|
||||
|
||||
. It is possible to compile pcretest so that it links with the libreadline
|
||||
library, by specifying
|
||||
|
||||
--enable-pcretest-libreadline
|
||||
|
||||
If this is done, when pcretest's input is from a terminal, it reads it using
|
||||
the readline() function. This provides line-editing and history facilities.
|
||||
Note that libreadline is GPL-licenced, so if you distribute a binary of
|
||||
pcretest linked in this way, there may be licensing issues.
|
||||
|
||||
The "configure" script builds the following files for the basic C library:
|
||||
|
||||
. Makefile is the makefile that builds the library
|
||||
@@ -500,7 +526,10 @@ in the comparison output, it means that locale is not available on your system,
|
||||
despite being listed by "locale". This does not mean that PCRE is broken.
|
||||
|
||||
[If you are trying to run this test on Windows, you may be able to get it to
|
||||
work by changing "fr_FR" to "french" everywhere it occurs.]
|
||||
work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
|
||||
RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
|
||||
Windows versions of test 2. More info on using RunTest.bat is included in the
|
||||
document entitled NON-UNIX-USE.]
|
||||
|
||||
The fourth test checks the UTF-8 support. It is not run automatically unless
|
||||
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
|
||||
@@ -714,4 +743,4 @@ The distribution should contain the following files:
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 24 April 2007
|
||||
Last updated: 18 December 2007
|
||||
|
||||
@@ -20,13 +20,24 @@ it to run on SunOS4 and other "close to standard" systems.
|
||||
|
||||
If you are going to build PCRE "by hand" on a system without "configure" you
|
||||
should copy the distributed config.h.generic to config.h, and then set up the
|
||||
macros the way you need them. Alternatively, you can avoid editing by using -D
|
||||
on the compiler command line to set the macro values.
|
||||
macro definitions the way you need them. You must then add -DHAVE_CONFIG_H to
|
||||
all of your compile commands, so that config.h is included at the start of
|
||||
every source.
|
||||
|
||||
Alternatively, you can avoid editing by using -D on the compiler command line
|
||||
to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H.
|
||||
|
||||
PCRE uses memmove() if HAVE_MEMMOVE is set to 1; otherwise it uses bcopy() if
|
||||
HAVE_BCOPY is set to 1. If your system has neither bcopy() nor memmove(), set
|
||||
them both to 0; an emulation function will be used. */
|
||||
|
||||
/* By default, the \R escape sequence matches any Unicode line ending
|
||||
character or sequence of characters. If BSR_ANYCRLF is defined, this is
|
||||
changed so that backslash-R matches only CR, LF, or CRLF. The build- time
|
||||
default can be overridden by the user of PCRE at runtime. On systems that
|
||||
support it, "configure" can be used to override the default. */
|
||||
/* #undef BSR_ANYCRLF */
|
||||
|
||||
/* If you are compiling for a system that uses EBCDIC instead of ASCII
|
||||
character codes, define this macro as 1. On systems that can use
|
||||
"configure", this can be done via --enable-ebcdic. */
|
||||
@@ -40,6 +51,11 @@ them both to 0; an emulation function will be used. */
|
||||
/* Define to 1 if you have the <bits/type_traits.h> header file. */
|
||||
/* #undef HAVE_BITS_TYPE_TRAITS_H */
|
||||
|
||||
/* Define to 1 if you have the <bzlib.h> header file. */
|
||||
#ifndef HAVE_BZLIB_H
|
||||
#define HAVE_BZLIB_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <dirent.h> header file. */
|
||||
#ifndef HAVE_DIRENT_H
|
||||
#define HAVE_DIRENT_H 1
|
||||
@@ -75,6 +91,16 @@ them both to 0; an emulation function will be used. */
|
||||
#define HAVE_MEMORY_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <readline/history.h> header file. */
|
||||
#ifndef HAVE_READLINE_HISTORY_H
|
||||
#define HAVE_READLINE_HISTORY_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <readline/readline.h> header file. */
|
||||
#ifndef HAVE_READLINE_READLINE_H
|
||||
#define HAVE_READLINE_READLINE_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <stdint.h> header file. */
|
||||
#ifndef HAVE_STDINT_H
|
||||
#define HAVE_STDINT_H 1
|
||||
@@ -141,6 +167,14 @@ them both to 0; an emulation function will be used. */
|
||||
/* Define to 1 if you have the <windows.h> header file. */
|
||||
/* #undef HAVE_WINDOWS_H */
|
||||
|
||||
/* Define to 1 if you have the <zlib.h> header file. */
|
||||
#ifndef HAVE_ZLIB_H
|
||||
#define HAVE_ZLIB_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the `_strtoi64' function. */
|
||||
/* #undef HAVE__STRTOI64 */
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
as offsets within the compiled regex. The default is 2, which allows for
|
||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
||||
@@ -189,10 +223,10 @@ them both to 0; an emulation function will be used. */
|
||||
#define MAX_NAME_SIZE 32
|
||||
#endif
|
||||
|
||||
/* The value of NEWLINE determines the newline character sequence. On
|
||||
Unix-like systems, "configure" can be used to override the default, which
|
||||
is 10. The possible values are 10 (LF), 13 (CR), 3338 (CRLF), -1 (ANY), or
|
||||
-2 (ANYCRLF). */
|
||||
/* The value of NEWLINE determines the newline character sequence. On systems
|
||||
that support it, "configure" can be used to override the default, which is
|
||||
10. The possible values are 10 (LF), 13 (CR), 3338 (CRLF), -1 (ANY), or -2
|
||||
(ANYCRLF). */
|
||||
#ifndef NEWLINE
|
||||
#define NEWLINE 10
|
||||
#endif
|
||||
@@ -217,13 +251,13 @@ them both to 0; an emulation function will be used. */
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE 7.3"
|
||||
#define PACKAGE_STRING "PCRE 7.5"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "7.3"
|
||||
#define PACKAGE_VERSION "7.5"
|
||||
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or
|
||||
@@ -257,6 +291,17 @@ them both to 0; an emulation function will be used. */
|
||||
#define STDC_HEADERS 1
|
||||
#endif
|
||||
|
||||
/* Define to allow pcregrep to be linked with libbz2, so that it is able to
|
||||
handle .bz2 files. */
|
||||
/* #undef SUPPORT_LIBBZ2 */
|
||||
|
||||
/* Define to allow pcretest to be linked with libreadline. */
|
||||
/* #undef SUPPORT_LIBREADLINE */
|
||||
|
||||
/* Define to allow pcregrep to be linked with libz, so that it is able to
|
||||
handle .gz files. */
|
||||
/* #undef SUPPORT_LIBZ */
|
||||
|
||||
/* Define to enable support for Unicode properties */
|
||||
/* #undef SUPPORT_UCP */
|
||||
|
||||
@@ -265,7 +310,7 @@ them both to 0; an emulation function will be used. */
|
||||
|
||||
/* Version number of package */
|
||||
#ifndef VERSION
|
||||
#define VERSION "7.3"
|
||||
#define VERSION "7.5"
|
||||
#endif
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
|
||||
@@ -43,7 +43,7 @@ character tables for PCRE. The tables are built according to the current
|
||||
locale. Now that pcre_maketables is a function visible to the outside world, we
|
||||
make use of its code from here in order to be consistent. */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include <ctype.h>
|
||||
#include <stdio.h>
|
||||
@@ -108,7 +108,7 @@ fprintf(f,
|
||||
"outside this compilation unit might reference this\" and so it will always\n"
|
||||
"be supplied to the linker. */\n\n"
|
||||
"#ifdef HAVE_CONFIG_H\n"
|
||||
"#include <config.h>\n"
|
||||
"#include \"config.h\"\n"
|
||||
"#endif\n\n"
|
||||
"#include \"pcre_internal.h\"\n\n");
|
||||
fprintf(f,
|
||||
|
||||
+825
-687
File diff suppressed because it is too large
Load Diff
@@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE_MAJOR 7
|
||||
#define PCRE_MINOR 3
|
||||
#define PCRE_MINOR 5
|
||||
#define PCRE_PRERELEASE
|
||||
#define PCRE_DATE 2007-08-28
|
||||
#define PCRE_DATE 2008-01-10
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE, the appropriate
|
||||
@@ -122,6 +122,8 @@ extern "C" {
|
||||
#define PCRE_NEWLINE_CRLF 0x00300000
|
||||
#define PCRE_NEWLINE_ANY 0x00400000
|
||||
#define PCRE_NEWLINE_ANYCRLF 0x00500000
|
||||
#define PCRE_BSR_ANYCRLF 0x00800000
|
||||
#define PCRE_BSR_UNICODE 0x01000000
|
||||
|
||||
/* Exec-time and get/set-time error codes */
|
||||
|
||||
@@ -147,7 +149,7 @@ extern "C" {
|
||||
#define PCRE_ERROR_DFA_WSSIZE (-19)
|
||||
#define PCRE_ERROR_DFA_RECURSE (-20)
|
||||
#define PCRE_ERROR_RECURSIONLIMIT (-21)
|
||||
#define PCRE_ERROR_NOTUSED (-22)
|
||||
#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */
|
||||
#define PCRE_ERROR_BADNEWLINE (-23)
|
||||
|
||||
/* Request types for pcre_fullinfo() */
|
||||
@@ -180,6 +182,7 @@ compatible. */
|
||||
#define PCRE_CONFIG_STACKRECURSE 5
|
||||
#define PCRE_CONFIG_UNICODE_PROPERTIES 6
|
||||
#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7
|
||||
#define PCRE_CONFIG_BSR 8
|
||||
|
||||
/* Bit flags for the pcre_extra structure. Do not re-arrange or redefine
|
||||
these bits, just add new ones on the end, in order to remain compatible. */
|
||||
|
||||
@@ -20,7 +20,7 @@ and dead code stripping is activated. This leads to link errors. Pulling in the
|
||||
header ensures that the array gets flagged as "someone outside this compilation
|
||||
unit might reference this" and so it will always be supplied to the linker. */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
+318
-172
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
supporting internal functions that are not used by other modules. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#define NLBLOCK cd /* Block containing newline information */
|
||||
#define PSSTART start_pattern /* Field containing processed string start */
|
||||
@@ -138,35 +138,47 @@ static const short int escapes[] = {
|
||||
#endif
|
||||
|
||||
|
||||
/* Table of special "verbs" like (*PRUNE) */
|
||||
/* Table of special "verbs" like (*PRUNE). This is a short table, so it is
|
||||
searched linearly. Put all the names into a single string, in order to reduce
|
||||
the number of relocations when a shared library is dynamically linked. */
|
||||
|
||||
typedef struct verbitem {
|
||||
const char *name;
|
||||
int len;
|
||||
int op;
|
||||
} verbitem;
|
||||
|
||||
static const char verbnames[] =
|
||||
"ACCEPT\0"
|
||||
"COMMIT\0"
|
||||
"F\0"
|
||||
"FAIL\0"
|
||||
"PRUNE\0"
|
||||
"SKIP\0"
|
||||
"THEN";
|
||||
|
||||
static verbitem verbs[] = {
|
||||
{ "ACCEPT", 6, OP_ACCEPT },
|
||||
{ "COMMIT", 6, OP_COMMIT },
|
||||
{ "F", 1, OP_FAIL },
|
||||
{ "FAIL", 4, OP_FAIL },
|
||||
{ "PRUNE", 5, OP_PRUNE },
|
||||
{ "SKIP", 4, OP_SKIP },
|
||||
{ "THEN", 4, OP_THEN }
|
||||
{ 6, OP_ACCEPT },
|
||||
{ 6, OP_COMMIT },
|
||||
{ 1, OP_FAIL },
|
||||
{ 4, OP_FAIL },
|
||||
{ 5, OP_PRUNE },
|
||||
{ 4, OP_SKIP },
|
||||
{ 4, OP_THEN }
|
||||
};
|
||||
|
||||
static int verbcount = sizeof(verbs)/sizeof(verbitem);
|
||||
|
||||
|
||||
/* Tables of names of POSIX character classes and their lengths. The list is
|
||||
terminated by a zero length entry. The first three must be alpha, lower, upper,
|
||||
as this is assumed for handling case independence. */
|
||||
/* Tables of names of POSIX character classes and their lengths. The names are
|
||||
now all in a single string, to reduce the number of relocations when a shared
|
||||
library is dynamically loaded. The list of lengths is terminated by a zero
|
||||
length entry. The first three must be alpha, lower, upper, as this is assumed
|
||||
for handling case independence. */
|
||||
|
||||
static const char *const posix_names[] = {
|
||||
"alpha", "lower", "upper",
|
||||
"alnum", "ascii", "blank", "cntrl", "digit", "graph",
|
||||
"print", "punct", "space", "word", "xdigit" };
|
||||
static const char posix_names[] =
|
||||
"alpha\0" "lower\0" "upper\0" "alnum\0" "ascii\0" "blank\0"
|
||||
"cntrl\0" "digit\0" "graph\0" "print\0" "punct\0" "space\0"
|
||||
"word\0" "xdigit";
|
||||
|
||||
static const uschar posix_name_lengths[] = {
|
||||
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 6, 0 };
|
||||
@@ -205,84 +217,90 @@ static const int posix_class_maps[] = {
|
||||
/* The texts of compile-time error messages. These are "char *" because they
|
||||
are passed to the outside world. Do not ever re-use any error number, because
|
||||
they are documented. Always add a new error instead. Messages marked DEAD below
|
||||
are no longer used. */
|
||||
are no longer used. This used to be a table of strings, but in order to reduce
|
||||
the number of relocations needed when a shared library is loaded dynamically,
|
||||
it is now one long string. We cannot use a table of offsets, because the
|
||||
lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we
|
||||
simply count through to the one we want - this isn't a performance issue
|
||||
because these strings are used only when there is a compilation error. */
|
||||
|
||||
static const char *error_texts[] = {
|
||||
"no error",
|
||||
"\\ at end of pattern",
|
||||
"\\c at end of pattern",
|
||||
"unrecognized character follows \\",
|
||||
"numbers out of order in {} quantifier",
|
||||
static const char error_texts[] =
|
||||
"no error\0"
|
||||
"\\ at end of pattern\0"
|
||||
"\\c at end of pattern\0"
|
||||
"unrecognized character follows \\\0"
|
||||
"numbers out of order in {} quantifier\0"
|
||||
/* 5 */
|
||||
"number too big in {} quantifier",
|
||||
"missing terminating ] for character class",
|
||||
"invalid escape sequence in character class",
|
||||
"range out of order in character class",
|
||||
"nothing to repeat",
|
||||
"number too big in {} quantifier\0"
|
||||
"missing terminating ] for character class\0"
|
||||
"invalid escape sequence in character class\0"
|
||||
"range out of order in character class\0"
|
||||
"nothing to repeat\0"
|
||||
/* 10 */
|
||||
"operand of unlimited repeat could match the empty string", /** DEAD **/
|
||||
"internal error: unexpected repeat",
|
||||
"unrecognized character after (?",
|
||||
"POSIX named classes are supported only within a class",
|
||||
"missing )",
|
||||
"operand of unlimited repeat could match the empty string\0" /** DEAD **/
|
||||
"internal error: unexpected repeat\0"
|
||||
"unrecognized character after (? or (?-\0"
|
||||
"POSIX named classes are supported only within a class\0"
|
||||
"missing )\0"
|
||||
/* 15 */
|
||||
"reference to non-existent subpattern",
|
||||
"erroffset passed as NULL",
|
||||
"unknown option bit(s) set",
|
||||
"missing ) after comment",
|
||||
"parentheses nested too deeply", /** DEAD **/
|
||||
"reference to non-existent subpattern\0"
|
||||
"erroffset passed as NULL\0"
|
||||
"unknown option bit(s) set\0"
|
||||
"missing ) after comment\0"
|
||||
"parentheses nested too deeply\0" /** DEAD **/
|
||||
/* 20 */
|
||||
"regular expression is too large",
|
||||
"failed to get memory",
|
||||
"unmatched parentheses",
|
||||
"internal error: code overflow",
|
||||
"unrecognized character after (?<",
|
||||
"regular expression is too large\0"
|
||||
"failed to get memory\0"
|
||||
"unmatched parentheses\0"
|
||||
"internal error: code overflow\0"
|
||||
"unrecognized character after (?<\0"
|
||||
/* 25 */
|
||||
"lookbehind assertion is not fixed length",
|
||||
"malformed number or name after (?(",
|
||||
"conditional group contains more than two branches",
|
||||
"assertion expected after (?(",
|
||||
"(?R or (?[+-]digits must be followed by )",
|
||||
"lookbehind assertion is not fixed length\0"
|
||||
"malformed number or name after (?(\0"
|
||||
"conditional group contains more than two branches\0"
|
||||
"assertion expected after (?(\0"
|
||||
"(?R or (?[+-]digits must be followed by )\0"
|
||||
/* 30 */
|
||||
"unknown POSIX class name",
|
||||
"POSIX collating elements are not supported",
|
||||
"this version of PCRE is not compiled with PCRE_UTF8 support",
|
||||
"spare error", /** DEAD **/
|
||||
"character value in \\x{...} sequence is too large",
|
||||
"unknown POSIX class name\0"
|
||||
"POSIX collating elements are not supported\0"
|
||||
"this version of PCRE is not compiled with PCRE_UTF8 support\0"
|
||||
"spare error\0" /** DEAD **/
|
||||
"character value in \\x{...} sequence is too large\0"
|
||||
/* 35 */
|
||||
"invalid condition (?(0)",
|
||||
"\\C not allowed in lookbehind assertion",
|
||||
"PCRE does not support \\L, \\l, \\N, \\U, or \\u",
|
||||
"number after (?C is > 255",
|
||||
"closing ) for (?C expected",
|
||||
"invalid condition (?(0)\0"
|
||||
"\\C not allowed in lookbehind assertion\0"
|
||||
"PCRE does not support \\L, \\l, \\N, \\U, or \\u\0"
|
||||
"number after (?C is > 255\0"
|
||||
"closing ) for (?C expected\0"
|
||||
/* 40 */
|
||||
"recursive call could loop indefinitely",
|
||||
"unrecognized character after (?P",
|
||||
"syntax error in subpattern name (missing terminator)",
|
||||
"two named subpatterns have the same name",
|
||||
"invalid UTF-8 string",
|
||||
"recursive call could loop indefinitely\0"
|
||||
"unrecognized character after (?P\0"
|
||||
"syntax error in subpattern name (missing terminator)\0"
|
||||
"two named subpatterns have the same name\0"
|
||||
"invalid UTF-8 string\0"
|
||||
/* 45 */
|
||||
"support for \\P, \\p, and \\X has not been compiled",
|
||||
"malformed \\P or \\p sequence",
|
||||
"unknown property name after \\P or \\p",
|
||||
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)",
|
||||
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")",
|
||||
"support for \\P, \\p, and \\X has not been compiled\0"
|
||||
"malformed \\P or \\p sequence\0"
|
||||
"unknown property name after \\P or \\p\0"
|
||||
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0"
|
||||
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0"
|
||||
/* 50 */
|
||||
"repeated subpattern is too long", /** DEAD **/
|
||||
"octal value is greater than \\377 (not in UTF-8 mode)",
|
||||
"internal error: overran compiling workspace",
|
||||
"internal error: previously-checked referenced subpattern not found",
|
||||
"DEFINE group contains more than one branch",
|
||||
"repeated subpattern is too long\0" /** DEAD **/
|
||||
"octal value is greater than \\377 (not in UTF-8 mode)\0"
|
||||
"internal error: overran compiling workspace\0"
|
||||
"internal error: previously-checked referenced subpattern not found\0"
|
||||
"DEFINE group contains more than one branch\0"
|
||||
/* 55 */
|
||||
"repeating a DEFINE group is not allowed",
|
||||
"inconsistent NEWLINE options",
|
||||
"\\g is not followed by a braced name or an optionally braced non-zero number",
|
||||
"(?+ or (?- or (?(+ or (?(- must be followed by a non-zero number",
|
||||
"(*VERB) with an argument is not supported",
|
||||
"repeating a DEFINE group is not allowed\0"
|
||||
"inconsistent NEWLINE options\0"
|
||||
"\\g is not followed by a braced name or an optionally braced non-zero number\0"
|
||||
"(?+ or (?- or (?(+ or (?(- must be followed by a non-zero number\0"
|
||||
"(*VERB) with an argument is not supported\0"
|
||||
/* 60 */
|
||||
"(*VERB) not recognized",
|
||||
"number is too big"
|
||||
};
|
||||
"(*VERB) not recognized\0"
|
||||
"number is too big\0"
|
||||
"subpattern name expected\0"
|
||||
"digit expected after (?+";
|
||||
|
||||
|
||||
/* Table to identify digits and hex digits. This is used when compiling
|
||||
@@ -417,6 +435,28 @@ static BOOL
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Find an error text *
|
||||
*************************************************/
|
||||
|
||||
/* The error texts are now all in one long string, to save on relocations. As
|
||||
some of the text is of unknown length, we can't use a table of offsets.
|
||||
Instead, just count through the strings. This is not a performance issue
|
||||
because it happens only when there has been a compilation error.
|
||||
|
||||
Argument: the error number
|
||||
Returns: pointer to the error string
|
||||
*/
|
||||
|
||||
static const char *
|
||||
find_error_text(int n)
|
||||
{
|
||||
const char *s = error_texts;
|
||||
for (; n > 0; n--) while (*s++ != 0);
|
||||
return s;
|
||||
}
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Handle escapes *
|
||||
*************************************************/
|
||||
@@ -456,16 +496,16 @@ ptr--; /* Set pointer back to the last byte */
|
||||
|
||||
if (c == 0) *errorcodeptr = ERR1;
|
||||
|
||||
/* Non-alphamerics are literals. For digits or letters, do an initial lookup in
|
||||
a table. A non-zero result is something that can be returned immediately.
|
||||
/* Non-alphanumerics are literals. For digits or letters, do an initial lookup
|
||||
in a table. A non-zero result is something that can be returned immediately.
|
||||
Otherwise further processing may be required. */
|
||||
|
||||
#ifndef EBCDIC /* ASCII coding */
|
||||
else if (c < '0' || c > 'z') {} /* Not alphameric */
|
||||
else if (c < '0' || c > 'z') {} /* Not alphanumeric */
|
||||
else if ((i = escapes[c - '0']) != 0) c = i;
|
||||
|
||||
#else /* EBCDIC coding */
|
||||
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphameric */
|
||||
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphanumeric */
|
||||
else if ((i = escapes[c - 0x48]) != 0) c = i;
|
||||
#endif
|
||||
|
||||
@@ -682,10 +722,10 @@ else
|
||||
break;
|
||||
|
||||
/* PCRE_EXTRA enables extensions to Perl in the matter of escapes. Any
|
||||
other alphameric following \ is an error if PCRE_EXTRA was set; otherwise,
|
||||
for Perl compatibility, it is a literal. This code looks a bit odd, but
|
||||
there used to be some cases other than the default, and there may be again
|
||||
in future, so I haven't "optimized" it. */
|
||||
other alphanumeric following \ is an error if PCRE_EXTRA was set;
|
||||
otherwise, for Perl compatibility, it is a literal. This code looks a bit
|
||||
odd, but there used to be some cases other than the default, and there may
|
||||
be again in future, so I haven't "optimized" it. */
|
||||
|
||||
default:
|
||||
if ((options & PCRE_EXTRA) != 0) switch(c)
|
||||
@@ -774,7 +814,7 @@ top = _pcre_utt_size;
|
||||
while (bot < top)
|
||||
{
|
||||
i = (bot + top) >> 1;
|
||||
c = strcmp(name, _pcre_utt[i].name);
|
||||
c = strcmp(name, _pcre_utt_names + _pcre_utt[i].name_offset);
|
||||
if (c == 0)
|
||||
{
|
||||
*dptr = _pcre_utt[i].value;
|
||||
@@ -1466,8 +1506,9 @@ for (;;)
|
||||
can match the empty string or not. It is called from could_be_empty()
|
||||
below and from compile_branch() when checking for an unlimited repeat of a
|
||||
group that can match nothing. Note that first_significant_code() skips over
|
||||
assertions. If we hit an unclosed bracket, we return "empty" - this means we've
|
||||
struck an inner bracket whose current branch will already have been scanned.
|
||||
backward and negative forward assertions when its final argument is TRUE. If we
|
||||
hit an unclosed bracket, we return "empty" - this means we've struck an inner
|
||||
bracket whose current branch will already have been scanned.
|
||||
|
||||
Arguments:
|
||||
code points to start of search
|
||||
@@ -1489,6 +1530,16 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
|
||||
|
||||
c = *code;
|
||||
|
||||
/* Skip over forward assertions; the other assertions are skipped by
|
||||
first_significant_code() with a TRUE final argument. */
|
||||
|
||||
if (c == OP_ASSERT)
|
||||
{
|
||||
do code += GET(code, 1); while (*code == OP_ALT);
|
||||
c = *code;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Groups with zero repeats can of course be empty; skip them. */
|
||||
|
||||
if (c == OP_BRAZERO || c == OP_BRAMINZERO)
|
||||
@@ -1684,29 +1735,48 @@ return TRUE;
|
||||
*************************************************/
|
||||
|
||||
/* This function is called when the sequence "[:" or "[." or "[=" is
|
||||
encountered in a character class. It checks whether this is followed by an
|
||||
optional ^ and then a sequence of letters, terminated by a matching ":]" or
|
||||
".]" or "=]".
|
||||
encountered in a character class. It checks whether this is followed by a
|
||||
sequence of characters terminated by a matching ":]" or ".]" or "=]". If we
|
||||
reach an unescaped ']' without the special preceding character, return FALSE.
|
||||
|
||||
Argument:
|
||||
Originally, this function only recognized a sequence of letters between the
|
||||
terminators, but it seems that Perl recognizes any sequence of characters,
|
||||
though of course unknown POSIX names are subsequently rejected. Perl gives an
|
||||
"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE
|
||||
didn't consider this to be a POSIX class. Likewise for [:1234:].
|
||||
|
||||
The problem in trying to be exactly like Perl is in the handling of escapes. We
|
||||
have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
|
||||
class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
|
||||
below handles the special case of \], but does not try to do any other escape
|
||||
processing. This makes it different from Perl for cases such as [:l\ower:]
|
||||
where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
|
||||
"l\ower". This is a lesser evil that not diagnosing bad classes when Perl does,
|
||||
I think.
|
||||
|
||||
Arguments:
|
||||
ptr pointer to the initial [
|
||||
endptr where to return the end pointer
|
||||
cd pointer to compile data
|
||||
|
||||
Returns: TRUE or FALSE
|
||||
*/
|
||||
|
||||
static BOOL
|
||||
check_posix_syntax(const uschar *ptr, const uschar **endptr, compile_data *cd)
|
||||
check_posix_syntax(const uschar *ptr, const uschar **endptr)
|
||||
{
|
||||
int terminator; /* Don't combine these lines; the Solaris cc */
|
||||
terminator = *(++ptr); /* compiler warns about "non-constant" initializer. */
|
||||
if (*(++ptr) == '^') ptr++;
|
||||
while ((cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
|
||||
if (*ptr == terminator && ptr[1] == ']')
|
||||
for (++ptr; *ptr != 0; ptr++)
|
||||
{
|
||||
*endptr = ptr;
|
||||
return TRUE;
|
||||
if (*ptr == '\\' && ptr[1] == ']') ptr++; else
|
||||
{
|
||||
if (*ptr == ']') return FALSE;
|
||||
if (*ptr == terminator && ptr[1] == ']')
|
||||
{
|
||||
*endptr = ptr;
|
||||
return TRUE;
|
||||
}
|
||||
}
|
||||
}
|
||||
return FALSE;
|
||||
}
|
||||
@@ -1731,11 +1801,13 @@ Returns: a value representing the name, or -1 if unknown
|
||||
static int
|
||||
check_posix_name(const uschar *ptr, int len)
|
||||
{
|
||||
const char *pn = posix_names;
|
||||
register int yield = 0;
|
||||
while (posix_name_lengths[yield] != 0)
|
||||
{
|
||||
if (len == posix_name_lengths[yield] &&
|
||||
strncmp((const char *)ptr, posix_names[yield], len) == 0) return yield;
|
||||
strncmp((const char *)ptr, pn, len) == 0) return yield;
|
||||
pn += posix_name_lengths[yield] + 1;
|
||||
yield++;
|
||||
}
|
||||
return -1;
|
||||
@@ -2341,6 +2413,7 @@ req_caseopt = ((options & PCRE_CASELESS) != 0)? REQ_CASELESS : 0;
|
||||
for (;; ptr++)
|
||||
{
|
||||
BOOL negate_class;
|
||||
BOOL should_flip_negation;
|
||||
BOOL possessive_quantifier;
|
||||
BOOL is_quantifier;
|
||||
BOOL is_recurse;
|
||||
@@ -2564,7 +2637,7 @@ for (;; ptr++)
|
||||
they are encountered at the top level, so we'll do that too. */
|
||||
|
||||
if ((ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
|
||||
check_posix_syntax(ptr, &tempptr, cd))
|
||||
check_posix_syntax(ptr, &tempptr))
|
||||
{
|
||||
*errorcodeptr = (ptr[1] == ':')? ERR13 : ERR31;
|
||||
goto FAILED;
|
||||
@@ -2589,6 +2662,12 @@ for (;; ptr++)
|
||||
else break;
|
||||
}
|
||||
|
||||
/* If a class contains a negative special such as \S, we need to flip the
|
||||
negation flag at the end, so that support for characters > 255 works
|
||||
correctly (they are all included in the class). */
|
||||
|
||||
should_flip_negation = FALSE;
|
||||
|
||||
/* Keep a count of chars with values < 256 so that we can optimize the case
|
||||
of just a single character (as long as it's < 256). However, For higher
|
||||
valued UTF-8 characters, we don't yet do any optimization. */
|
||||
@@ -2644,7 +2723,7 @@ for (;; ptr++)
|
||||
|
||||
if (c == '[' &&
|
||||
(ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
|
||||
check_posix_syntax(ptr, &tempptr, cd))
|
||||
check_posix_syntax(ptr, &tempptr))
|
||||
{
|
||||
BOOL local_negate = FALSE;
|
||||
int posix_class, taboffset, tabopt;
|
||||
@@ -2661,6 +2740,7 @@ for (;; ptr++)
|
||||
if (*ptr == '^')
|
||||
{
|
||||
local_negate = TRUE;
|
||||
should_flip_negation = TRUE; /* Note negative special */
|
||||
ptr++;
|
||||
}
|
||||
|
||||
@@ -2735,7 +2815,7 @@ for (;; ptr++)
|
||||
c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
|
||||
if (*errorcodeptr != 0) goto FAILED;
|
||||
|
||||
if (-c == ESC_b) c = '\b'; /* \b is backslash in a class */
|
||||
if (-c == ESC_b) c = '\b'; /* \b is backspace in a class */
|
||||
else if (-c == ESC_X) c = 'X'; /* \X is literal X in a class */
|
||||
else if (-c == ESC_R) c = 'R'; /* \R is literal R in a class */
|
||||
else if (-c == ESC_Q) /* Handle start of quoted string */
|
||||
@@ -2763,6 +2843,7 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_D:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_digit];
|
||||
continue;
|
||||
|
||||
@@ -2771,6 +2852,7 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_W:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
|
||||
continue;
|
||||
|
||||
@@ -2780,13 +2862,11 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_S:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
|
||||
classbits[1] |= 0x08; /* Perl 5.004 onwards omits VT from \s */
|
||||
continue;
|
||||
|
||||
case ESC_E: /* Perl ignores an orphan \E */
|
||||
continue;
|
||||
|
||||
default: /* Not recognized; fall through */
|
||||
break; /* Need "default" setting to stop compiler warning. */
|
||||
}
|
||||
@@ -2974,6 +3054,12 @@ for (;; ptr++)
|
||||
|
||||
oldptr = ptr;
|
||||
|
||||
/* Remember \r or \n */
|
||||
|
||||
if (c == '\r' || c == '\n') cd->external_flags |= PCRE_HASCRORLF;
|
||||
|
||||
/* Check for range */
|
||||
|
||||
if (!inescq && ptr[1] == '-')
|
||||
{
|
||||
int d;
|
||||
@@ -3015,7 +3101,7 @@ for (;; ptr++)
|
||||
d = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
|
||||
if (*errorcodeptr != 0) goto FAILED;
|
||||
|
||||
/* \b is backslash; \X is literal X; \R is literal R; any other
|
||||
/* \b is backspace; \X is literal X; \R is literal R; any other
|
||||
special means the '-' was literal */
|
||||
|
||||
if (d < 0)
|
||||
@@ -3041,6 +3127,10 @@ for (;; ptr++)
|
||||
|
||||
if (d == c) goto LONE_SINGLE_CHARACTER; /* A few lines below */
|
||||
|
||||
/* Remember \r or \n */
|
||||
|
||||
if (d == '\r' || d == '\n') cd->external_flags |= PCRE_HASCRORLF;
|
||||
|
||||
/* In UTF-8 mode, if the upper limit is > 255, or > 127 for caseless
|
||||
matching, we have to use an XCLASS with extra data items. Caseless
|
||||
matching for characters > 127 is available only if UCP support is
|
||||
@@ -3194,16 +3284,24 @@ for (;; ptr++)
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
|
||||
/* This code has been disabled because it would mean that \s counts as
|
||||
an explicit \r or \n reference, and that's not really what is wanted. Now
|
||||
we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
#if 0
|
||||
/* Remember whether \r or \n are in this class */
|
||||
|
||||
if (negate_class)
|
||||
{
|
||||
if ((classbits[1] & 0x24) != 0x24) cd->external_options |= PCRE_HASCRORLF;
|
||||
if ((classbits[1] & 0x24) != 0x24) cd->external_flags |= PCRE_HASCRORLF;
|
||||
}
|
||||
else
|
||||
{
|
||||
if ((classbits[1] & 0x24) != 0) cd->external_options |= PCRE_HASCRORLF;
|
||||
if ((classbits[1] & 0x24) != 0) cd->external_flags |= PCRE_HASCRORLF;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
/* If class_charcount is 1, we saw precisely one character whose value is
|
||||
less than 256. As long as there were no characters >= 128 and there was no
|
||||
@@ -3267,11 +3365,14 @@ for (;; ptr++)
|
||||
zeroreqbyte = reqbyte;
|
||||
|
||||
/* If there are characters with values > 255, we have to compile an
|
||||
extended class, with its own opcode. If there are no characters < 256,
|
||||
we can omit the bitmap in the actual compiled code. */
|
||||
extended class, with its own opcode, unless there was a negated special
|
||||
such as \S in the class, because in that case all characters > 255 are in
|
||||
the class, so any that were explicitly given as well can be ignored. If
|
||||
(when there are explicit characters > 255 that must be listed) there are no
|
||||
characters < 256, we can omit the bitmap in the actual compiled code. */
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (class_utf8)
|
||||
if (class_utf8 && !should_flip_negation)
|
||||
{
|
||||
*class_utf8data++ = XCL_END; /* Marks the end of extra data */
|
||||
*code++ = OP_XCLASS;
|
||||
@@ -3297,20 +3398,19 @@ for (;; ptr++)
|
||||
}
|
||||
#endif
|
||||
|
||||
/* If there are no characters > 255, negate the 32-byte map if necessary,
|
||||
and copy it into the code vector. If this is the first thing in the branch,
|
||||
there can be no first char setting, whatever the repeat count. Any reqbyte
|
||||
setting must remain unchanged after any kind of repeat. */
|
||||
/* If there are no characters > 255, set the opcode to OP_CLASS or
|
||||
OP_NCLASS, depending on whether the whole class was negated and whether
|
||||
there were negative specials such as \S in the class. Then copy the 32-byte
|
||||
map into the code vector, negating it if necessary. */
|
||||
|
||||
*code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS;
|
||||
if (negate_class)
|
||||
{
|
||||
*code++ = OP_NCLASS;
|
||||
if (lengthptr == NULL) /* Save time in the pre-compile phase */
|
||||
for (c = 0; c < 32; c++) code[c] = ~classbits[c];
|
||||
}
|
||||
else
|
||||
{
|
||||
*code++ = OP_CLASS;
|
||||
memcpy(code, classbits, 32);
|
||||
}
|
||||
code += 32;
|
||||
@@ -3496,7 +3596,7 @@ for (;; ptr++)
|
||||
/* All real repeats make it impossible to handle partial matching (maybe
|
||||
one day we will be able to remove this restriction). */
|
||||
|
||||
if (repeat_max != 1) cd->nopartial = TRUE;
|
||||
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
|
||||
|
||||
/* Combine the op_type with the repeat_type */
|
||||
|
||||
@@ -3646,7 +3746,7 @@ for (;; ptr++)
|
||||
/* All real repeats make it impossible to handle partial matching (maybe
|
||||
one day we will be able to remove this restriction). */
|
||||
|
||||
if (repeat_max != 1) cd->nopartial = TRUE;
|
||||
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
|
||||
|
||||
if (repeat_min == 0 && repeat_max == -1)
|
||||
*code++ = OP_CRSTAR + repeat_type;
|
||||
@@ -3946,7 +4046,9 @@ for (;; ptr++)
|
||||
int len;
|
||||
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
|
||||
*tempcode == OP_NOTEXACT)
|
||||
tempcode += _pcre_OP_lengths[*tempcode];
|
||||
tempcode += _pcre_OP_lengths[*tempcode] +
|
||||
((*tempcode == OP_TYPEEXACT &&
|
||||
(tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
|
||||
len = code - tempcode;
|
||||
if (len > 0) switch (*tempcode)
|
||||
{
|
||||
@@ -4004,6 +4106,7 @@ for (;; ptr++)
|
||||
if (*(++ptr) == '*' && (cd->ctypes[ptr[1]] & ctype_letter) != 0)
|
||||
{
|
||||
int i, namelen;
|
||||
const char *vn = verbnames;
|
||||
const uschar *name = ++ptr;
|
||||
previous = NULL;
|
||||
while ((cd->ctypes[*++ptr] & ctype_letter) != 0);
|
||||
@@ -4021,12 +4124,13 @@ for (;; ptr++)
|
||||
for (i = 0; i < verbcount; i++)
|
||||
{
|
||||
if (namelen == verbs[i].len &&
|
||||
strncmp((char *)name, verbs[i].name, namelen) == 0)
|
||||
strncmp((char *)name, vn, namelen) == 0)
|
||||
{
|
||||
*code = verbs[i].op;
|
||||
if (*code++ == OP_ACCEPT) cd->had_accept = TRUE;
|
||||
break;
|
||||
}
|
||||
vn += verbs[i].len + 1;
|
||||
}
|
||||
if (i < verbcount) continue;
|
||||
*errorcodeptr = ERR60;
|
||||
@@ -4171,16 +4275,13 @@ for (;; ptr++)
|
||||
*errorcodeptr = ERR58;
|
||||
goto FAILED;
|
||||
}
|
||||
if (refsign == '-')
|
||||
recno = (refsign == '-')?
|
||||
cd->bracount - recno + 1 : recno +cd->bracount;
|
||||
if (recno <= 0 || recno > cd->final_bracount)
|
||||
{
|
||||
recno = cd->bracount - recno + 1;
|
||||
if (recno <= 0)
|
||||
{
|
||||
*errorcodeptr = ERR15;
|
||||
goto FAILED;
|
||||
}
|
||||
*errorcodeptr = ERR15;
|
||||
goto FAILED;
|
||||
}
|
||||
else recno += cd->bracount;
|
||||
PUT2(code, 2+LINK_SIZE, recno);
|
||||
break;
|
||||
}
|
||||
@@ -4252,9 +4353,10 @@ for (;; ptr++)
|
||||
skipbytes = 1;
|
||||
}
|
||||
|
||||
/* Check for the "name" actually being a subpattern number. */
|
||||
/* Check for the "name" actually being a subpattern number. We are
|
||||
in the second pass here, so final_bracount is set. */
|
||||
|
||||
else if (recno > 0)
|
||||
else if (recno > 0 && recno <= cd->final_bracount)
|
||||
{
|
||||
PUT2(code, 2+LINK_SIZE, recno);
|
||||
}
|
||||
@@ -4448,7 +4550,9 @@ for (;; ptr++)
|
||||
|
||||
/* We come here from the Python syntax above that handles both
|
||||
references (?P=name) and recursion (?P>name), as well as falling
|
||||
through from the Perl recursion syntax (?&name). */
|
||||
through from the Perl recursion syntax (?&name). We also come here from
|
||||
the Perl \k<name> or \k'name' back reference syntax and the \k{name}
|
||||
.NET syntax. */
|
||||
|
||||
NAMED_REF_OR_RECURSE:
|
||||
name = ++ptr;
|
||||
@@ -4460,6 +4564,11 @@ for (;; ptr++)
|
||||
|
||||
if (lengthptr != NULL)
|
||||
{
|
||||
if (namelen == 0)
|
||||
{
|
||||
*errorcodeptr = ERR62;
|
||||
goto FAILED;
|
||||
}
|
||||
if (*ptr != terminator)
|
||||
{
|
||||
*errorcodeptr = ERR42;
|
||||
@@ -4473,14 +4582,19 @@ for (;; ptr++)
|
||||
recno = 0;
|
||||
}
|
||||
|
||||
/* In the real compile, seek the name in the table */
|
||||
/* In the real compile, seek the name in the table. We check the name
|
||||
first, and then check that we have reached the end of the name in the
|
||||
table. That way, if the name that is longer than any in the table,
|
||||
the comparison will fail without reading beyond the table entry. */
|
||||
|
||||
else
|
||||
{
|
||||
slot = cd->name_table;
|
||||
for (i = 0; i < cd->names_found; i++)
|
||||
{
|
||||
if (strncmp((char *)name, (char *)slot+2, namelen) == 0) break;
|
||||
if (strncmp((char *)name, (char *)slot+2, namelen) == 0 &&
|
||||
slot[2+namelen] == 0)
|
||||
break;
|
||||
slot += cd->name_entry_size;
|
||||
}
|
||||
|
||||
@@ -4517,7 +4631,15 @@ for (;; ptr++)
|
||||
{
|
||||
const uschar *called;
|
||||
|
||||
if ((refsign = *ptr) == '+') ptr++;
|
||||
if ((refsign = *ptr) == '+')
|
||||
{
|
||||
ptr++;
|
||||
if ((digitab[*ptr] & ctype_digit) == 0)
|
||||
{
|
||||
*errorcodeptr = ERR63;
|
||||
goto FAILED;
|
||||
}
|
||||
}
|
||||
else if (refsign == '-')
|
||||
{
|
||||
if ((digitab[ptr[1]] & ctype_digit) == 0)
|
||||
@@ -4643,7 +4765,7 @@ for (;; ptr++)
|
||||
|
||||
case 'J': /* Record that it changed in the external options */
|
||||
*optset |= PCRE_DUPNAMES;
|
||||
cd->external_options |= PCRE_JCHANGED;
|
||||
cd->external_flags |= PCRE_JCHANGED;
|
||||
break;
|
||||
|
||||
case 'i': *optset |= PCRE_CASELESS; break;
|
||||
@@ -5063,7 +5185,7 @@ for (;; ptr++)
|
||||
/* Remember if \r or \n were seen */
|
||||
|
||||
if (mcbuffer[0] == '\r' || mcbuffer[0] == '\n')
|
||||
cd->external_options |= PCRE_HASCRORLF;
|
||||
cd->external_flags |= PCRE_HASCRORLF;
|
||||
|
||||
/* Set the first and required bytes appropriately. If no previous first
|
||||
byte, set it from this character, but revert to none on a zero repeat.
|
||||
@@ -5743,24 +5865,46 @@ cd->fcc = tables + fcc_offset;
|
||||
cd->cbits = tables + cbits_offset;
|
||||
cd->ctypes = tables + ctypes_offset;
|
||||
|
||||
/* Check for newline settings at the start of the pattern, and remember the
|
||||
offset for later. */
|
||||
/* Check for global one-time settings at the start of the pattern, and remember
|
||||
the offset for later. */
|
||||
|
||||
if (ptr[0] == '(' && ptr[1] == '*')
|
||||
while (ptr[skipatstart] == '(' && ptr[skipatstart+1] == '*')
|
||||
{
|
||||
int newnl = 0;
|
||||
if (strncmp((char *)(ptr+2), "CR)", 3) == 0)
|
||||
{ skipatstart = 5; newnl = PCRE_NEWLINE_CR; }
|
||||
else if (strncmp((char *)(ptr+2), "LF)", 3) == 0)
|
||||
{ skipatstart = 5; newnl = PCRE_NEWLINE_LF; }
|
||||
else if (strncmp((char *)(ptr+2), "CRLF)", 5) == 0)
|
||||
{ skipatstart = 7; newnl = PCRE_NEWLINE_CR + PCRE_NEWLINE_LF; }
|
||||
else if (strncmp((char *)(ptr+2), "ANY)", 4) == 0)
|
||||
{ skipatstart = 6; newnl = PCRE_NEWLINE_ANY; }
|
||||
else if (strncmp((char *)(ptr+2), "ANYCRLF)", 8) == 0)
|
||||
{ skipatstart = 10; newnl = PCRE_NEWLINE_ANYCRLF; }
|
||||
if (skipatstart > 0)
|
||||
int newbsr = 0;
|
||||
|
||||
if (strncmp((char *)(ptr+skipatstart+2), "CR)", 3) == 0)
|
||||
{ skipatstart += 5; newnl = PCRE_NEWLINE_CR; }
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "LF)", 3) == 0)
|
||||
{ skipatstart += 5; newnl = PCRE_NEWLINE_LF; }
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "CRLF)", 5) == 0)
|
||||
{ skipatstart += 7; newnl = PCRE_NEWLINE_CR + PCRE_NEWLINE_LF; }
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "ANY)", 4) == 0)
|
||||
{ skipatstart += 6; newnl = PCRE_NEWLINE_ANY; }
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "ANYCRLF)", 8) == 0)
|
||||
{ skipatstart += 10; newnl = PCRE_NEWLINE_ANYCRLF; }
|
||||
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "BSR_ANYCRLF)", 12) == 0)
|
||||
{ skipatstart += 14; newbsr = PCRE_BSR_ANYCRLF; }
|
||||
else if (strncmp((char *)(ptr+skipatstart+2), "BSR_UNICODE)", 12) == 0)
|
||||
{ skipatstart += 14; newbsr = PCRE_BSR_UNICODE; }
|
||||
|
||||
if (newnl != 0)
|
||||
options = (options & ~PCRE_NEWLINE_BITS) | newnl;
|
||||
else if (newbsr != 0)
|
||||
options = (options & ~(PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) | newbsr;
|
||||
else break;
|
||||
}
|
||||
|
||||
/* Check validity of \R options. */
|
||||
|
||||
switch (options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE))
|
||||
{
|
||||
case 0:
|
||||
case PCRE_BSR_ANYCRLF:
|
||||
case PCRE_BSR_UNICODE:
|
||||
break;
|
||||
default: errorcode = ERR56; goto PCRE_EARLY_ERROR_RETURN;
|
||||
}
|
||||
|
||||
/* Handle different types of newline. The three bits give seven cases. The
|
||||
@@ -5822,7 +5966,7 @@ to compile parts of the pattern into; the compiled code is discarded when it is
|
||||
no longer needed, so hopefully this workspace will never overflow, though there
|
||||
is a test for its doing so. */
|
||||
|
||||
cd->bracount = 0;
|
||||
cd->bracount = cd->final_bracount = 0;
|
||||
cd->names_found = 0;
|
||||
cd->name_entry_size = 0;
|
||||
cd->name_table = NULL;
|
||||
@@ -5832,8 +5976,8 @@ cd->hwm = cworkspace;
|
||||
cd->start_pattern = (const uschar *)pattern;
|
||||
cd->end_pattern = (const uschar *)(pattern + strlen(pattern));
|
||||
cd->req_varyopt = 0;
|
||||
cd->nopartial = FALSE;
|
||||
cd->external_options = options;
|
||||
cd->external_flags = 0;
|
||||
|
||||
/* Now do the pre-compile. On error, errorcode will be set non-zero, so we
|
||||
don't need to look at the result of the function here. The initial options have
|
||||
@@ -5872,14 +6016,16 @@ if (re == NULL)
|
||||
goto PCRE_EARLY_ERROR_RETURN;
|
||||
}
|
||||
|
||||
/* Put in the magic number, and save the sizes, initial options, and character
|
||||
table pointer. NULL is used for the default character tables. The nullpad field
|
||||
is at the end; it's there to help in the case when a regex compiled on a system
|
||||
with 4-byte pointers is run on another with 8-byte pointers. */
|
||||
/* Put in the magic number, and save the sizes, initial options, internal
|
||||
flags, and character table pointer. NULL is used for the default character
|
||||
tables. The nullpad field is at the end; it's there to help in the case when a
|
||||
regex compiled on a system with 4-byte pointers is run on another with 8-byte
|
||||
pointers. */
|
||||
|
||||
re->magic_number = MAGIC_NUMBER;
|
||||
re->size = size;
|
||||
re->options = cd->external_options;
|
||||
re->flags = cd->external_flags;
|
||||
re->dummy1 = 0;
|
||||
re->first_byte = 0;
|
||||
re->req_byte = 0;
|
||||
@@ -5897,6 +6043,7 @@ field. Reset the bracket count and the names_found field. Also reset the hwm
|
||||
field; this time it's used for remembering forward references to subpatterns.
|
||||
*/
|
||||
|
||||
cd->final_bracount = cd->bracount; /* Save for checking forward references */
|
||||
cd->bracount = 0;
|
||||
cd->names_found = 0;
|
||||
cd->name_table = (uschar *)re + re->name_table_offset;
|
||||
@@ -5904,7 +6051,6 @@ codestart = cd->name_table + re->name_entry_size * re->name_count;
|
||||
cd->start_code = codestart;
|
||||
cd->hwm = cworkspace;
|
||||
cd->req_varyopt = 0;
|
||||
cd->nopartial = FALSE;
|
||||
cd->had_accept = FALSE;
|
||||
|
||||
/* Set up a starting, non-extracting bracket, then compile the expression. On
|
||||
@@ -5918,8 +6064,8 @@ code = (uschar *)codestart;
|
||||
&errorcode, FALSE, FALSE, 0, &firstbyte, &reqbyte, NULL, cd, NULL);
|
||||
re->top_bracket = cd->bracount;
|
||||
re->top_backref = cd->top_backref;
|
||||
re->flags = cd->external_flags;
|
||||
|
||||
if (cd->nopartial) re->options |= PCRE_NOPARTIAL;
|
||||
if (cd->had_accept) reqbyte = -1; /* Must disable after (*ACCEPT) */
|
||||
|
||||
/* If not reached end of pattern on success, there's an excess bracket. */
|
||||
@@ -5962,7 +6108,7 @@ if (errorcode != 0)
|
||||
PCRE_EARLY_ERROR_RETURN:
|
||||
*erroroffset = ptr - (const uschar *)pattern;
|
||||
PCRE_EARLY_ERROR_RETURN2:
|
||||
*errorptr = error_texts[errorcode];
|
||||
*errorptr = find_error_text(errorcode);
|
||||
if (errorcodeptr != NULL) *errorcodeptr = errorcode;
|
||||
return NULL;
|
||||
}
|
||||
@@ -5991,10 +6137,10 @@ if ((re->options & PCRE_ANCHORED) == 0)
|
||||
int ch = firstbyte & 255;
|
||||
re->first_byte = ((firstbyte & REQ_CASELESS) != 0 &&
|
||||
cd->fcc[ch] == ch)? ch : firstbyte;
|
||||
re->options |= PCRE_FIRSTSET;
|
||||
re->flags |= PCRE_FIRSTSET;
|
||||
}
|
||||
else if (is_startline(codestart, 0, cd->backref_map))
|
||||
re->options |= PCRE_STARTLINE;
|
||||
re->flags |= PCRE_STARTLINE;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -6008,7 +6154,7 @@ if (reqbyte >= 0 &&
|
||||
int ch = reqbyte & 255;
|
||||
re->req_byte = ((reqbyte & REQ_CASELESS) != 0 &&
|
||||
cd->fcc[ch] == ch)? (reqbyte & ~REQ_CASELESS) : reqbyte;
|
||||
re->options |= PCRE_REQCHSET;
|
||||
re->flags |= PCRE_REQCHSET;
|
||||
}
|
||||
|
||||
/* Print out the compiled data if debugging is enabled. This is never the
|
||||
@@ -6021,7 +6167,7 @@ printf("Length = %d top_bracket = %d top_backref = %d\n",
|
||||
|
||||
printf("Options=%08x\n", re->options);
|
||||
|
||||
if ((re->options & PCRE_FIRSTSET) != 0)
|
||||
if ((re->flags & PCRE_FIRSTSET) != 0)
|
||||
{
|
||||
int ch = re->first_byte & 255;
|
||||
const char *caseless = ((re->first_byte & REQ_CASELESS) == 0)?
|
||||
@@ -6030,7 +6176,7 @@ if ((re->options & PCRE_FIRSTSET) != 0)
|
||||
else printf("First char = \\x%02x%s\n", ch, caseless);
|
||||
}
|
||||
|
||||
if ((re->options & PCRE_REQCHSET) != 0)
|
||||
if ((re->flags & PCRE_REQCHSET) != 0)
|
||||
{
|
||||
int ch = re->req_byte & 255;
|
||||
const char *caseless = ((re->req_byte & REQ_CASELESS) == 0)?
|
||||
@@ -6047,7 +6193,7 @@ was compiled can be seen. */
|
||||
if (code - codestart > length)
|
||||
{
|
||||
(pcre_free)(re);
|
||||
*errorptr = error_texts[ERR23];
|
||||
*errorptr = find_error_text(ERR23);
|
||||
*erroroffset = ptr - (uschar *)pattern;
|
||||
if (errorcodeptr != NULL) *errorcodeptr = ERR23;
|
||||
return NULL;
|
||||
|
||||
@@ -41,7 +41,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* This module contains the external function pcre_config(). */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -85,6 +85,14 @@ switch (what)
|
||||
*((int *)where) = NEWLINE;
|
||||
break;
|
||||
|
||||
case PCRE_CONFIG_BSR:
|
||||
#ifdef BSR_ANYCRLF
|
||||
*((int *)where) = 1;
|
||||
#else
|
||||
*((int *)where) = 0;
|
||||
#endif
|
||||
break;
|
||||
|
||||
case PCRE_CONFIG_LINK_SIZE:
|
||||
*((int *)where) = LINK_SIZE;
|
||||
break;
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
pattern matching using an NFA algorithm, trying to mimic Perl as closely as
|
||||
possible. There are also some static supporting functions. */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#define NLBLOCK md /* Block containing newline information */
|
||||
#define PSSTART start_subject /* Field containing processed string start */
|
||||
@@ -1524,12 +1524,16 @@ for (;;)
|
||||
case 0x000d:
|
||||
if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
|
||||
break;
|
||||
|
||||
case 0x000a:
|
||||
break;
|
||||
|
||||
case 0x000b:
|
||||
case 0x000c:
|
||||
case 0x0085:
|
||||
case 0x2028:
|
||||
case 0x2029:
|
||||
if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
|
||||
break;
|
||||
}
|
||||
ecode++;
|
||||
@@ -2952,12 +2956,16 @@ for (;;)
|
||||
case 0x000d:
|
||||
if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
|
||||
break;
|
||||
|
||||
case 0x000a:
|
||||
break;
|
||||
|
||||
case 0x000b:
|
||||
case 0x000c:
|
||||
case 0x0085:
|
||||
case 0x2028:
|
||||
case 0x2029:
|
||||
if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -3170,9 +3178,12 @@ for (;;)
|
||||
if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
|
||||
break;
|
||||
case 0x000a:
|
||||
break;
|
||||
|
||||
case 0x000b:
|
||||
case 0x000c:
|
||||
case 0x0085:
|
||||
if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -3424,11 +3435,14 @@ for (;;)
|
||||
if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
|
||||
break;
|
||||
case 0x000a:
|
||||
break;
|
||||
|
||||
case 0x000b:
|
||||
case 0x000c:
|
||||
case 0x0085:
|
||||
case 0x2028:
|
||||
case 0x2029:
|
||||
if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
|
||||
break;
|
||||
}
|
||||
break;
|
||||
@@ -3580,10 +3594,14 @@ for (;;)
|
||||
case 0x000d:
|
||||
if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
|
||||
break;
|
||||
|
||||
case 0x000a:
|
||||
break;
|
||||
|
||||
case 0x000b:
|
||||
case 0x000c:
|
||||
case 0x0085:
|
||||
if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
|
||||
break;
|
||||
}
|
||||
break;
|
||||
@@ -3881,8 +3899,10 @@ for (;;)
|
||||
}
|
||||
else
|
||||
{
|
||||
if (c != 0x000a && c != 0x000b && c != 0x000c &&
|
||||
c != 0x0085 && c != 0x2028 && c != 0x2029)
|
||||
if (c != 0x000a &&
|
||||
(md->bsr_anycrlf ||
|
||||
(c != 0x000b && c != 0x000c &&
|
||||
c != 0x0085 && c != 0x2028 && c != 0x2029)))
|
||||
break;
|
||||
eptr += len;
|
||||
}
|
||||
@@ -4072,7 +4092,9 @@ for (;;)
|
||||
}
|
||||
else
|
||||
{
|
||||
if (c != 0x000a && c != 0x000b && c != 0x000c && c != 0x0085)
|
||||
if (c != 0x000a &&
|
||||
(md->bsr_anycrlf ||
|
||||
(c != 0x000b && c != 0x000c && c != 0x0085)))
|
||||
break;
|
||||
eptr++;
|
||||
}
|
||||
@@ -4222,12 +4244,17 @@ HEAP_RETURN:
|
||||
switch (frame->Xwhere)
|
||||
{
|
||||
LBL( 1) LBL( 2) LBL( 3) LBL( 4) LBL( 5) LBL( 6) LBL( 7) LBL( 8)
|
||||
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
|
||||
LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
|
||||
LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
|
||||
LBL(33) LBL(34) LBL(35) LBL(36) LBL(37) LBL(38) LBL(39) LBL(40)
|
||||
LBL(41) LBL(42) LBL(43) LBL(44) LBL(45) LBL(46) LBL(47) LBL(48)
|
||||
LBL(49) LBL(50) LBL(51) LBL(52) LBL(53) LBL(54)
|
||||
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(17)
|
||||
LBL(19) LBL(24) LBL(25) LBL(26) LBL(27) LBL(29) LBL(31) LBL(33)
|
||||
LBL(35) LBL(43) LBL(47) LBL(48) LBL(49) LBL(50) LBL(51) LBL(52)
|
||||
LBL(53) LBL(54)
|
||||
#ifdef SUPPORT_UTF8
|
||||
LBL(16) LBL(18) LBL(20) LBL(21) LBL(22) LBL(23) LBL(28) LBL(30)
|
||||
LBL(32) LBL(34) LBL(42) LBL(46)
|
||||
#ifdef SUPPORT_UCP
|
||||
LBL(36) LBL(37) LBL(38) LBL(39) LBL(40) LBL(41) LBL(44) LBL(45)
|
||||
#endif /* SUPPORT_UCP */
|
||||
#endif /* SUPPORT_UTF8 */
|
||||
default:
|
||||
DPRINTF(("jump error in pcre match: label %d non-existent\n", frame->Xwhere));
|
||||
return PCRE_ERROR_INTERNAL;
|
||||
@@ -4406,7 +4433,7 @@ if (re->magic_number != MAGIC_NUMBER)
|
||||
/* Set up other data */
|
||||
|
||||
anchored = ((re->options | options) & PCRE_ANCHORED) != 0;
|
||||
startline = (re->options & PCRE_STARTLINE) != 0;
|
||||
startline = (re->flags & PCRE_STARTLINE) != 0;
|
||||
firstline = (re->options & PCRE_FIRSTLINE) != 0;
|
||||
|
||||
/* The code starts after the real_pcre block and the capture name table. */
|
||||
@@ -4433,11 +4460,37 @@ md->recursive = NULL; /* No recursion at top level */
|
||||
md->lcc = tables + lcc_offset;
|
||||
md->ctypes = tables + ctypes_offset;
|
||||
|
||||
/* Handle different \R options. */
|
||||
|
||||
switch (options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE))
|
||||
{
|
||||
case 0:
|
||||
if ((re->options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) != 0)
|
||||
md->bsr_anycrlf = (re->options & PCRE_BSR_ANYCRLF) != 0;
|
||||
else
|
||||
#ifdef BSR_ANYCRLF
|
||||
md->bsr_anycrlf = TRUE;
|
||||
#else
|
||||
md->bsr_anycrlf = FALSE;
|
||||
#endif
|
||||
break;
|
||||
|
||||
case PCRE_BSR_ANYCRLF:
|
||||
md->bsr_anycrlf = TRUE;
|
||||
break;
|
||||
|
||||
case PCRE_BSR_UNICODE:
|
||||
md->bsr_anycrlf = FALSE;
|
||||
break;
|
||||
|
||||
default: return PCRE_ERROR_BADNEWLINE;
|
||||
}
|
||||
|
||||
/* Handle different types of newline. The three bits give eight cases. If
|
||||
nothing is set at run time, whatever was used at compile time applies. */
|
||||
|
||||
switch ((((options & PCRE_NEWLINE_BITS) == 0)? re->options : (pcre_uint32)options) &
|
||||
PCRE_NEWLINE_BITS)
|
||||
switch ((((options & PCRE_NEWLINE_BITS) == 0)? re->options :
|
||||
(pcre_uint32)options) & PCRE_NEWLINE_BITS)
|
||||
{
|
||||
case 0: newline = NEWLINE; break; /* Compile-time default */
|
||||
case PCRE_NEWLINE_CR: newline = '\r'; break;
|
||||
@@ -4476,7 +4529,7 @@ else
|
||||
/* Partial matching is supported only for a restricted set of regexes at the
|
||||
moment. */
|
||||
|
||||
if (md->partial && (re->options & PCRE_NOPARTIAL) != 0)
|
||||
if (md->partial && (re->flags & PCRE_NOPARTIAL) != 0)
|
||||
return PCRE_ERROR_BADPARTIAL;
|
||||
|
||||
/* Check a UTF-8 string if required. Unfortunately there's no way of passing
|
||||
@@ -4553,7 +4606,7 @@ studied, there may be a bitmap of possible first characters. */
|
||||
|
||||
if (!anchored)
|
||||
{
|
||||
if ((re->options & PCRE_FIRSTSET) != 0)
|
||||
if ((re->flags & PCRE_FIRSTSET) != 0)
|
||||
{
|
||||
first_byte = re->first_byte & 255;
|
||||
if ((first_byte_caseless = ((re->first_byte & REQ_CASELESS) != 0)) == TRUE)
|
||||
@@ -4568,7 +4621,7 @@ if (!anchored)
|
||||
/* For anchored or unanchored matches, there may be a "last known required
|
||||
character" set. */
|
||||
|
||||
if ((re->options & PCRE_REQCHSET) != 0)
|
||||
if ((re->flags & PCRE_REQCHSET) != 0)
|
||||
{
|
||||
req_byte = re->req_byte & 255;
|
||||
req_byte_caseless = (re->req_byte & REQ_CASELESS) != 0;
|
||||
@@ -4615,10 +4668,10 @@ for(;;)
|
||||
if (first_byte_caseless)
|
||||
while (start_match < end_subject &&
|
||||
md->lcc[*start_match] != first_byte)
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
else
|
||||
while (start_match < end_subject && *start_match != first_byte)
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
}
|
||||
|
||||
/* Or to just after a linebreak for a multiline match if possible */
|
||||
@@ -4628,7 +4681,7 @@ for(;;)
|
||||
if (start_match > md->start_subject + start_offset)
|
||||
{
|
||||
while (start_match <= end_subject && !WAS_NEWLINE(start_match))
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
|
||||
/* If we have just passed a CR and the newline option is ANY or ANYCRLF,
|
||||
and we are now at a LF, advance the match position by one more character.
|
||||
@@ -4649,7 +4702,9 @@ for(;;)
|
||||
while (start_match < end_subject)
|
||||
{
|
||||
register unsigned int c = *start_match;
|
||||
if ((start_bits[c/8] & (1 << (c&7))) == 0) start_match++; else break;
|
||||
if ((start_bits[c/8] & (1 << (c&7))) == 0)
|
||||
{ NEXTCHAR(start_match); }
|
||||
else break;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4790,7 +4845,7 @@ for(;;)
|
||||
if (start_match[-1] == '\r' &&
|
||||
start_match < end_subject &&
|
||||
*start_match == '\n' &&
|
||||
(re->options & PCRE_HASCRORLF) == 0 &&
|
||||
(re->flags & PCRE_HASCRORLF) == 0 &&
|
||||
(md->nltype == NLTYPE_ANY ||
|
||||
md->nltype == NLTYPE_ANYCRLF ||
|
||||
md->nllen == 2))
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
information about a compiled pattern. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -108,8 +108,8 @@ switch (what)
|
||||
|
||||
case PCRE_INFO_FIRSTBYTE:
|
||||
*((int *)where) =
|
||||
((re->options & PCRE_FIRSTSET) != 0)? re->first_byte :
|
||||
((re->options & PCRE_STARTLINE) != 0)? -1 : -2;
|
||||
((re->flags & PCRE_FIRSTSET) != 0)? re->first_byte :
|
||||
((re->flags & PCRE_STARTLINE) != 0)? -1 : -2;
|
||||
break;
|
||||
|
||||
/* Make sure we pass back the pointer to the bit vector in the external
|
||||
@@ -123,7 +123,7 @@ switch (what)
|
||||
|
||||
case PCRE_INFO_LASTLITERAL:
|
||||
*((int *)where) =
|
||||
((re->options & PCRE_REQCHSET) != 0)? re->req_byte : -1;
|
||||
((re->flags & PCRE_REQCHSET) != 0)? re->req_byte : -1;
|
||||
break;
|
||||
|
||||
case PCRE_INFO_NAMEENTRYSIZE:
|
||||
@@ -143,15 +143,15 @@ switch (what)
|
||||
break;
|
||||
|
||||
case PCRE_INFO_OKPARTIAL:
|
||||
*((int *)where) = (re->options & PCRE_NOPARTIAL) == 0;
|
||||
*((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
|
||||
break;
|
||||
|
||||
case PCRE_INFO_JCHANGED:
|
||||
*((int *)where) = (re->options & PCRE_JCHANGED) != 0;
|
||||
*((int *)where) = (re->flags & PCRE_JCHANGED) != 0;
|
||||
break;
|
||||
|
||||
case PCRE_INFO_HASCRORLF:
|
||||
*((int *)where) = (re->options & PCRE_HASCRORLF) != 0;
|
||||
*((int *)where) = (re->flags & PCRE_HASCRORLF) != 0;
|
||||
break;
|
||||
|
||||
default: return PCRE_ERROR_BADOPTION;
|
||||
|
||||
@@ -43,7 +43,7 @@ from the subject string after a regex match has succeeded. The original idea
|
||||
for these functions came from Scott Wimer. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -187,7 +187,7 @@ const real_pcre *re = (const real_pcre *)code;
|
||||
int entrysize;
|
||||
char *first, *last;
|
||||
uschar *entry;
|
||||
if ((re->options & (PCRE_DUPNAMES | PCRE_JCHANGED)) == 0)
|
||||
if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0)
|
||||
return pcre_get_stringnumber(code, stringname);
|
||||
entrysize = pcre_get_stringtable_entries(code, stringname, &first, &last);
|
||||
if (entrysize <= 0) return entrysize;
|
||||
|
||||
@@ -46,7 +46,7 @@ indirection. These values can be changed by the caller, but are shared between
|
||||
all threads. However, when compiling for Virtual Pascal, things are done
|
||||
differently, and global variables are not used (see pcre.in). */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -43,7 +43,7 @@ information about a compiled pattern. However, use of this function is now
|
||||
deprecated, as it has been superseded by pcre_fullinfo(). */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -83,8 +83,8 @@ if (re->magic_number != MAGIC_NUMBER)
|
||||
}
|
||||
if (optptr != NULL) *optptr = (int)(re->options & PUBLIC_OPTIONS);
|
||||
if (first_byte != NULL)
|
||||
*first_byte = ((re->options & PCRE_FIRSTSET) != 0)? re->first_byte :
|
||||
((re->options & PCRE_STARTLINE) != 0)? -1 : -2;
|
||||
*first_byte = ((re->flags & PCRE_FIRSTSET) != 0)? re->first_byte :
|
||||
((re->flags & PCRE_STARTLINE) != 0)? -1 : -2;
|
||||
return re->top_bracket;
|
||||
}
|
||||
|
||||
|
||||
@@ -363,6 +363,7 @@ never be called in byte mode. To make sure it can never even appear when UTF-8
|
||||
support is omitted, we don't even define it. */
|
||||
|
||||
#ifndef SUPPORT_UTF8
|
||||
#define NEXTCHAR(p) p++;
|
||||
#define GETCHAR(c, eptr) c = *eptr;
|
||||
#define GETCHARTEST(c, eptr) c = *eptr;
|
||||
#define GETCHARINC(c, eptr) c = *eptr++;
|
||||
@@ -372,6 +373,13 @@ support is omitted, we don't even define it. */
|
||||
|
||||
#else /* SUPPORT_UTF8 */
|
||||
|
||||
/* Advance a character pointer one byte in non-UTF-8 mode and by one character
|
||||
in UTF-8 mode. */
|
||||
|
||||
#define NEXTCHAR(p) \
|
||||
p++; \
|
||||
if (utf8) { while((*p & 0xc0) == 0x80) p++; }
|
||||
|
||||
/* Get the next UTF-8 character, not advancing the pointer. This is called when
|
||||
we know we are in UTF-8 mode. */
|
||||
|
||||
@@ -481,18 +489,16 @@ Standard C system should have one. */
|
||||
|
||||
#define PCRE_IMS (PCRE_CASELESS|PCRE_MULTILINE|PCRE_DOTALL)
|
||||
|
||||
/* Private options flags start at the most significant end of the four bytes.
|
||||
The public options defined in pcre.h start at the least significant end. Make
|
||||
sure they don't overlap! The bits are getting a bit scarce now -- when we run
|
||||
out, there is a dummy word in the structure that could be used for the private
|
||||
bits. */
|
||||
/* Private flags containing information about the compiled regex. They used to
|
||||
live at the top end of the options word, but that got almost full, so now they
|
||||
are in a 16-bit flags word. */
|
||||
|
||||
#define PCRE_NOPARTIAL 0x80000000 /* can't use partial with this regex */
|
||||
#define PCRE_FIRSTSET 0x40000000 /* first_byte is set */
|
||||
#define PCRE_REQCHSET 0x20000000 /* req_byte is set */
|
||||
#define PCRE_STARTLINE 0x10000000 /* start after \n for multiline */
|
||||
#define PCRE_JCHANGED 0x08000000 /* j option changes within regex */
|
||||
#define PCRE_HASCRORLF 0x04000000 /* explicit \r or \n in pattern */
|
||||
#define PCRE_NOPARTIAL 0x0001 /* can't use partial with this regex */
|
||||
#define PCRE_FIRSTSET 0x0002 /* first_byte is set */
|
||||
#define PCRE_REQCHSET 0x0004 /* req_byte is set */
|
||||
#define PCRE_STARTLINE 0x0008 /* start after \n for multiline */
|
||||
#define PCRE_JCHANGED 0x0010 /* j option used in regex */
|
||||
#define PCRE_HASCRORLF 0x0020 /* explicit \r or \n in pattern */
|
||||
|
||||
/* Options for the "extra" block produced by pcre_study(). */
|
||||
|
||||
@@ -508,15 +514,16 @@ time, run time, or study time, respectively. */
|
||||
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
|
||||
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
|
||||
PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
|
||||
PCRE_DUPNAMES|PCRE_NEWLINE_BITS)
|
||||
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)
|
||||
|
||||
#define PUBLIC_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
|
||||
PCRE_PARTIAL|PCRE_NEWLINE_BITS)
|
||||
PCRE_PARTIAL|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)
|
||||
|
||||
#define PUBLIC_DFA_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
|
||||
PCRE_PARTIAL|PCRE_DFA_SHORTEST|PCRE_DFA_RESTART|PCRE_NEWLINE_BITS)
|
||||
PCRE_PARTIAL|PCRE_DFA_SHORTEST|PCRE_DFA_RESTART|PCRE_NEWLINE_BITS| \
|
||||
PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)
|
||||
|
||||
#define PUBLIC_STUDY_OPTIONS 0 /* None defined */
|
||||
|
||||
@@ -872,7 +879,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
|
||||
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
|
||||
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
|
||||
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
|
||||
ERR60, ERR61 };
|
||||
ERR60, ERR61, ERR62, ERR63 };
|
||||
|
||||
/* The real format of the start of the pcre block; the index of names and the
|
||||
code vector run on as long as necessary after the end. We store an explicit
|
||||
@@ -894,9 +901,9 @@ NOTE NOTE NOTE:
|
||||
typedef struct real_pcre {
|
||||
pcre_uint32 magic_number;
|
||||
pcre_uint32 size; /* Total that was malloced */
|
||||
pcre_uint32 options;
|
||||
pcre_uint32 dummy1; /* For future use, maybe */
|
||||
|
||||
pcre_uint32 options; /* Public options */
|
||||
pcre_uint16 flags; /* Private flags */
|
||||
pcre_uint16 dummy1; /* For future use */
|
||||
pcre_uint16 top_bracket;
|
||||
pcre_uint16 top_backref;
|
||||
pcre_uint16 first_byte;
|
||||
@@ -935,12 +942,13 @@ typedef struct compile_data {
|
||||
uschar *name_table; /* The name/number table */
|
||||
int names_found; /* Number of entries so far */
|
||||
int name_entry_size; /* Size of each entry */
|
||||
int bracount; /* Count of capturing parens */
|
||||
int bracount; /* Count of capturing parens as we compile */
|
||||
int final_bracount; /* Saved value after first pass */
|
||||
int top_backref; /* Maximum back reference */
|
||||
unsigned int backref_map; /* Bitmap of low back refs */
|
||||
int external_options; /* External (initial) options */
|
||||
int external_flags; /* External flag bits to be set */
|
||||
int req_varyopt; /* "After variable item" flag for reqbyte */
|
||||
BOOL nopartial; /* Set TRUE if partial won't work */
|
||||
BOOL had_accept; /* (*ACCEPT) encountered */
|
||||
int nltype; /* Newline type */
|
||||
int nllen; /* Newline string length */
|
||||
@@ -1000,6 +1008,7 @@ typedef struct match_data {
|
||||
BOOL notempty; /* Empty string match not wanted */
|
||||
BOOL partial; /* PARTIAL flag */
|
||||
BOOL hitend; /* Hit the end of the subject at some point */
|
||||
BOOL bsr_anycrlf; /* \R is just any CRLF, not full Unicode */
|
||||
const uschar *start_code; /* For use when recursing */
|
||||
USPTR start_subject; /* Start of the subject string */
|
||||
USPTR end_subject; /* End of the subject string */
|
||||
@@ -1036,7 +1045,7 @@ typedef struct dfa_match_data {
|
||||
#define ctype_letter 0x02
|
||||
#define ctype_digit 0x04
|
||||
#define ctype_xdigit 0x08
|
||||
#define ctype_word 0x10 /* alphameric or '_' */
|
||||
#define ctype_word 0x10 /* alphanumeric or '_' */
|
||||
#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */
|
||||
|
||||
/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set
|
||||
@@ -1064,10 +1073,12 @@ total length. */
|
||||
#define tables_length (ctypes_offset + 256)
|
||||
|
||||
/* Layout of the UCP type table that translates property names into types and
|
||||
codes. */
|
||||
codes. Each entry used to point directly to a name, but to reduce the number of
|
||||
relocations in shared libraries, it now has an offset into a single string
|
||||
instead. */
|
||||
|
||||
typedef struct {
|
||||
const char *name;
|
||||
pcre_uint16 name_offset;
|
||||
pcre_uint16 type;
|
||||
pcre_uint16 value;
|
||||
} ucp_type_table;
|
||||
@@ -1085,6 +1096,7 @@ extern const uschar _pcre_utf8_table4[];
|
||||
|
||||
extern const int _pcre_utf8_table1_size;
|
||||
|
||||
extern const char _pcre_utt_names[];
|
||||
extern const ucp_type_table _pcre_utt[];
|
||||
extern const int _pcre_utt_size;
|
||||
|
||||
|
||||
@@ -45,7 +45,7 @@ compilation of dftables.c, in which case the macro DFTABLES is defined. */
|
||||
|
||||
|
||||
#ifndef DFTABLES
|
||||
# include <config.h>
|
||||
# include "config.h"
|
||||
# include "pcre_internal.h"
|
||||
#endif
|
||||
|
||||
|
||||
@@ -47,7 +47,7 @@ and NLTYPE_ANY. The full list of Unicode newline characters is taken from
|
||||
http://unicode.org/unicode/reports/tr18/. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -41,7 +41,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* This file contains a private PCRE function that converts an ordinal
|
||||
character value into a UTF8 string. */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -126,7 +126,7 @@ for (i = _pcre_utt_size - 1; i >= 0; i--)
|
||||
{
|
||||
if (ptype == _pcre_utt[i].type && pvalue == _pcre_utt[i].value) break;
|
||||
}
|
||||
return (i >= 0)? _pcre_utt[i].name : "??";
|
||||
return (i >= 0)? _pcre_utt_names + _pcre_utt[i].name_offset : "??";
|
||||
#else
|
||||
/* It gets harder and harder to shut off unwanted compiler warnings. */
|
||||
ptype = ptype * pvalue;
|
||||
|
||||
@@ -44,7 +44,7 @@ pattern data block. This might be helpful in applications where the block is
|
||||
shared by different users. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
supporting functions. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -525,7 +525,8 @@ code = (uschar *)re + re->name_table_offset +
|
||||
a multiline pattern that matches only at "line starts", no further processing
|
||||
at present. */
|
||||
|
||||
if ((re->options & (PCRE_ANCHORED|PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
|
||||
if ((re->options & PCRE_ANCHORED) != 0 ||
|
||||
(re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
|
||||
return NULL;
|
||||
|
||||
/* Set the character tables in the block that is passed around */
|
||||
|
||||
+221
-108
@@ -44,7 +44,7 @@ uses macros to change their names from _pcre_xxx to xxxx, thereby avoiding name
|
||||
clashes with the library. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -85,115 +85,228 @@ const uschar _pcre_utf8_table4[] = {
|
||||
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
|
||||
3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
|
||||
|
||||
/* This table translates Unicode property names into type and code values. It
|
||||
is searched by binary chop, so must be in collating sequence of name. */
|
||||
/* The pcre_utt[] table below translates Unicode property names into type and
|
||||
code values. It is searched by binary chop, so must be in collating sequence of
|
||||
name. Originally, the table contained pointers to the name strings in the first
|
||||
field of each entry. However, that leads to a large number of relocations when
|
||||
a shared library is dynamically loaded. A significant reduction is made by
|
||||
putting all the names into a single, large string and then using offsets in the
|
||||
table itself. Maintenance is more error-prone, but frequent changes to this
|
||||
data is unlikely. */
|
||||
|
||||
const char _pcre_utt_names[] =
|
||||
"Any\0"
|
||||
"Arabic\0"
|
||||
"Armenian\0"
|
||||
"Balinese\0"
|
||||
"Bengali\0"
|
||||
"Bopomofo\0"
|
||||
"Braille\0"
|
||||
"Buginese\0"
|
||||
"Buhid\0"
|
||||
"C\0"
|
||||
"Canadian_Aboriginal\0"
|
||||
"Cc\0"
|
||||
"Cf\0"
|
||||
"Cherokee\0"
|
||||
"Cn\0"
|
||||
"Co\0"
|
||||
"Common\0"
|
||||
"Coptic\0"
|
||||
"Cs\0"
|
||||
"Cuneiform\0"
|
||||
"Cypriot\0"
|
||||
"Cyrillic\0"
|
||||
"Deseret\0"
|
||||
"Devanagari\0"
|
||||
"Ethiopic\0"
|
||||
"Georgian\0"
|
||||
"Glagolitic\0"
|
||||
"Gothic\0"
|
||||
"Greek\0"
|
||||
"Gujarati\0"
|
||||
"Gurmukhi\0"
|
||||
"Han\0"
|
||||
"Hangul\0"
|
||||
"Hanunoo\0"
|
||||
"Hebrew\0"
|
||||
"Hiragana\0"
|
||||
"Inherited\0"
|
||||
"Kannada\0"
|
||||
"Katakana\0"
|
||||
"Kharoshthi\0"
|
||||
"Khmer\0"
|
||||
"L\0"
|
||||
"L&\0"
|
||||
"Lao\0"
|
||||
"Latin\0"
|
||||
"Limbu\0"
|
||||
"Linear_B\0"
|
||||
"Ll\0"
|
||||
"Lm\0"
|
||||
"Lo\0"
|
||||
"Lt\0"
|
||||
"Lu\0"
|
||||
"M\0"
|
||||
"Malayalam\0"
|
||||
"Mc\0"
|
||||
"Me\0"
|
||||
"Mn\0"
|
||||
"Mongolian\0"
|
||||
"Myanmar\0"
|
||||
"N\0"
|
||||
"Nd\0"
|
||||
"New_Tai_Lue\0"
|
||||
"Nko\0"
|
||||
"Nl\0"
|
||||
"No\0"
|
||||
"Ogham\0"
|
||||
"Old_Italic\0"
|
||||
"Old_Persian\0"
|
||||
"Oriya\0"
|
||||
"Osmanya\0"
|
||||
"P\0"
|
||||
"Pc\0"
|
||||
"Pd\0"
|
||||
"Pe\0"
|
||||
"Pf\0"
|
||||
"Phags_Pa\0"
|
||||
"Phoenician\0"
|
||||
"Pi\0"
|
||||
"Po\0"
|
||||
"Ps\0"
|
||||
"Runic\0"
|
||||
"S\0"
|
||||
"Sc\0"
|
||||
"Shavian\0"
|
||||
"Sinhala\0"
|
||||
"Sk\0"
|
||||
"Sm\0"
|
||||
"So\0"
|
||||
"Syloti_Nagri\0"
|
||||
"Syriac\0"
|
||||
"Tagalog\0"
|
||||
"Tagbanwa\0"
|
||||
"Tai_Le\0"
|
||||
"Tamil\0"
|
||||
"Telugu\0"
|
||||
"Thaana\0"
|
||||
"Thai\0"
|
||||
"Tibetan\0"
|
||||
"Tifinagh\0"
|
||||
"Ugaritic\0"
|
||||
"Yi\0"
|
||||
"Z\0"
|
||||
"Zl\0"
|
||||
"Zp\0"
|
||||
"Zs\0";
|
||||
|
||||
const ucp_type_table _pcre_utt[] = {
|
||||
{ "Any", PT_ANY, 0 },
|
||||
{ "Arabic", PT_SC, ucp_Arabic },
|
||||
{ "Armenian", PT_SC, ucp_Armenian },
|
||||
{ "Balinese", PT_SC, ucp_Balinese },
|
||||
{ "Bengali", PT_SC, ucp_Bengali },
|
||||
{ "Bopomofo", PT_SC, ucp_Bopomofo },
|
||||
{ "Braille", PT_SC, ucp_Braille },
|
||||
{ "Buginese", PT_SC, ucp_Buginese },
|
||||
{ "Buhid", PT_SC, ucp_Buhid },
|
||||
{ "C", PT_GC, ucp_C },
|
||||
{ "Canadian_Aboriginal", PT_SC, ucp_Canadian_Aboriginal },
|
||||
{ "Cc", PT_PC, ucp_Cc },
|
||||
{ "Cf", PT_PC, ucp_Cf },
|
||||
{ "Cherokee", PT_SC, ucp_Cherokee },
|
||||
{ "Cn", PT_PC, ucp_Cn },
|
||||
{ "Co", PT_PC, ucp_Co },
|
||||
{ "Common", PT_SC, ucp_Common },
|
||||
{ "Coptic", PT_SC, ucp_Coptic },
|
||||
{ "Cs", PT_PC, ucp_Cs },
|
||||
{ "Cuneiform", PT_SC, ucp_Cuneiform },
|
||||
{ "Cypriot", PT_SC, ucp_Cypriot },
|
||||
{ "Cyrillic", PT_SC, ucp_Cyrillic },
|
||||
{ "Deseret", PT_SC, ucp_Deseret },
|
||||
{ "Devanagari", PT_SC, ucp_Devanagari },
|
||||
{ "Ethiopic", PT_SC, ucp_Ethiopic },
|
||||
{ "Georgian", PT_SC, ucp_Georgian },
|
||||
{ "Glagolitic", PT_SC, ucp_Glagolitic },
|
||||
{ "Gothic", PT_SC, ucp_Gothic },
|
||||
{ "Greek", PT_SC, ucp_Greek },
|
||||
{ "Gujarati", PT_SC, ucp_Gujarati },
|
||||
{ "Gurmukhi", PT_SC, ucp_Gurmukhi },
|
||||
{ "Han", PT_SC, ucp_Han },
|
||||
{ "Hangul", PT_SC, ucp_Hangul },
|
||||
{ "Hanunoo", PT_SC, ucp_Hanunoo },
|
||||
{ "Hebrew", PT_SC, ucp_Hebrew },
|
||||
{ "Hiragana", PT_SC, ucp_Hiragana },
|
||||
{ "Inherited", PT_SC, ucp_Inherited },
|
||||
{ "Kannada", PT_SC, ucp_Kannada },
|
||||
{ "Katakana", PT_SC, ucp_Katakana },
|
||||
{ "Kharoshthi", PT_SC, ucp_Kharoshthi },
|
||||
{ "Khmer", PT_SC, ucp_Khmer },
|
||||
{ "L", PT_GC, ucp_L },
|
||||
{ "L&", PT_LAMP, 0 },
|
||||
{ "Lao", PT_SC, ucp_Lao },
|
||||
{ "Latin", PT_SC, ucp_Latin },
|
||||
{ "Limbu", PT_SC, ucp_Limbu },
|
||||
{ "Linear_B", PT_SC, ucp_Linear_B },
|
||||
{ "Ll", PT_PC, ucp_Ll },
|
||||
{ "Lm", PT_PC, ucp_Lm },
|
||||
{ "Lo", PT_PC, ucp_Lo },
|
||||
{ "Lt", PT_PC, ucp_Lt },
|
||||
{ "Lu", PT_PC, ucp_Lu },
|
||||
{ "M", PT_GC, ucp_M },
|
||||
{ "Malayalam", PT_SC, ucp_Malayalam },
|
||||
{ "Mc", PT_PC, ucp_Mc },
|
||||
{ "Me", PT_PC, ucp_Me },
|
||||
{ "Mn", PT_PC, ucp_Mn },
|
||||
{ "Mongolian", PT_SC, ucp_Mongolian },
|
||||
{ "Myanmar", PT_SC, ucp_Myanmar },
|
||||
{ "N", PT_GC, ucp_N },
|
||||
{ "Nd", PT_PC, ucp_Nd },
|
||||
{ "New_Tai_Lue", PT_SC, ucp_New_Tai_Lue },
|
||||
{ "Nko", PT_SC, ucp_Nko },
|
||||
{ "Nl", PT_PC, ucp_Nl },
|
||||
{ "No", PT_PC, ucp_No },
|
||||
{ "Ogham", PT_SC, ucp_Ogham },
|
||||
{ "Old_Italic", PT_SC, ucp_Old_Italic },
|
||||
{ "Old_Persian", PT_SC, ucp_Old_Persian },
|
||||
{ "Oriya", PT_SC, ucp_Oriya },
|
||||
{ "Osmanya", PT_SC, ucp_Osmanya },
|
||||
{ "P", PT_GC, ucp_P },
|
||||
{ "Pc", PT_PC, ucp_Pc },
|
||||
{ "Pd", PT_PC, ucp_Pd },
|
||||
{ "Pe", PT_PC, ucp_Pe },
|
||||
{ "Pf", PT_PC, ucp_Pf },
|
||||
{ "Phags_Pa", PT_SC, ucp_Phags_Pa },
|
||||
{ "Phoenician", PT_SC, ucp_Phoenician },
|
||||
{ "Pi", PT_PC, ucp_Pi },
|
||||
{ "Po", PT_PC, ucp_Po },
|
||||
{ "Ps", PT_PC, ucp_Ps },
|
||||
{ "Runic", PT_SC, ucp_Runic },
|
||||
{ "S", PT_GC, ucp_S },
|
||||
{ "Sc", PT_PC, ucp_Sc },
|
||||
{ "Shavian", PT_SC, ucp_Shavian },
|
||||
{ "Sinhala", PT_SC, ucp_Sinhala },
|
||||
{ "Sk", PT_PC, ucp_Sk },
|
||||
{ "Sm", PT_PC, ucp_Sm },
|
||||
{ "So", PT_PC, ucp_So },
|
||||
{ "Syloti_Nagri", PT_SC, ucp_Syloti_Nagri },
|
||||
{ "Syriac", PT_SC, ucp_Syriac },
|
||||
{ "Tagalog", PT_SC, ucp_Tagalog },
|
||||
{ "Tagbanwa", PT_SC, ucp_Tagbanwa },
|
||||
{ "Tai_Le", PT_SC, ucp_Tai_Le },
|
||||
{ "Tamil", PT_SC, ucp_Tamil },
|
||||
{ "Telugu", PT_SC, ucp_Telugu },
|
||||
{ "Thaana", PT_SC, ucp_Thaana },
|
||||
{ "Thai", PT_SC, ucp_Thai },
|
||||
{ "Tibetan", PT_SC, ucp_Tibetan },
|
||||
{ "Tifinagh", PT_SC, ucp_Tifinagh },
|
||||
{ "Ugaritic", PT_SC, ucp_Ugaritic },
|
||||
{ "Yi", PT_SC, ucp_Yi },
|
||||
{ "Z", PT_GC, ucp_Z },
|
||||
{ "Zl", PT_PC, ucp_Zl },
|
||||
{ "Zp", PT_PC, ucp_Zp },
|
||||
{ "Zs", PT_PC, ucp_Zs }
|
||||
{ 0, PT_ANY, 0 },
|
||||
{ 4, PT_SC, ucp_Arabic },
|
||||
{ 11, PT_SC, ucp_Armenian },
|
||||
{ 20, PT_SC, ucp_Balinese },
|
||||
{ 29, PT_SC, ucp_Bengali },
|
||||
{ 37, PT_SC, ucp_Bopomofo },
|
||||
{ 46, PT_SC, ucp_Braille },
|
||||
{ 54, PT_SC, ucp_Buginese },
|
||||
{ 63, PT_SC, ucp_Buhid },
|
||||
{ 69, PT_GC, ucp_C },
|
||||
{ 71, PT_SC, ucp_Canadian_Aboriginal },
|
||||
{ 91, PT_PC, ucp_Cc },
|
||||
{ 94, PT_PC, ucp_Cf },
|
||||
{ 97, PT_SC, ucp_Cherokee },
|
||||
{ 106, PT_PC, ucp_Cn },
|
||||
{ 109, PT_PC, ucp_Co },
|
||||
{ 112, PT_SC, ucp_Common },
|
||||
{ 119, PT_SC, ucp_Coptic },
|
||||
{ 126, PT_PC, ucp_Cs },
|
||||
{ 129, PT_SC, ucp_Cuneiform },
|
||||
{ 139, PT_SC, ucp_Cypriot },
|
||||
{ 147, PT_SC, ucp_Cyrillic },
|
||||
{ 156, PT_SC, ucp_Deseret },
|
||||
{ 164, PT_SC, ucp_Devanagari },
|
||||
{ 175, PT_SC, ucp_Ethiopic },
|
||||
{ 184, PT_SC, ucp_Georgian },
|
||||
{ 193, PT_SC, ucp_Glagolitic },
|
||||
{ 204, PT_SC, ucp_Gothic },
|
||||
{ 211, PT_SC, ucp_Greek },
|
||||
{ 217, PT_SC, ucp_Gujarati },
|
||||
{ 226, PT_SC, ucp_Gurmukhi },
|
||||
{ 235, PT_SC, ucp_Han },
|
||||
{ 239, PT_SC, ucp_Hangul },
|
||||
{ 246, PT_SC, ucp_Hanunoo },
|
||||
{ 254, PT_SC, ucp_Hebrew },
|
||||
{ 261, PT_SC, ucp_Hiragana },
|
||||
{ 270, PT_SC, ucp_Inherited },
|
||||
{ 280, PT_SC, ucp_Kannada },
|
||||
{ 288, PT_SC, ucp_Katakana },
|
||||
{ 297, PT_SC, ucp_Kharoshthi },
|
||||
{ 308, PT_SC, ucp_Khmer },
|
||||
{ 314, PT_GC, ucp_L },
|
||||
{ 316, PT_LAMP, 0 },
|
||||
{ 319, PT_SC, ucp_Lao },
|
||||
{ 323, PT_SC, ucp_Latin },
|
||||
{ 329, PT_SC, ucp_Limbu },
|
||||
{ 335, PT_SC, ucp_Linear_B },
|
||||
{ 344, PT_PC, ucp_Ll },
|
||||
{ 347, PT_PC, ucp_Lm },
|
||||
{ 350, PT_PC, ucp_Lo },
|
||||
{ 353, PT_PC, ucp_Lt },
|
||||
{ 356, PT_PC, ucp_Lu },
|
||||
{ 359, PT_GC, ucp_M },
|
||||
{ 361, PT_SC, ucp_Malayalam },
|
||||
{ 371, PT_PC, ucp_Mc },
|
||||
{ 374, PT_PC, ucp_Me },
|
||||
{ 377, PT_PC, ucp_Mn },
|
||||
{ 380, PT_SC, ucp_Mongolian },
|
||||
{ 390, PT_SC, ucp_Myanmar },
|
||||
{ 398, PT_GC, ucp_N },
|
||||
{ 400, PT_PC, ucp_Nd },
|
||||
{ 403, PT_SC, ucp_New_Tai_Lue },
|
||||
{ 415, PT_SC, ucp_Nko },
|
||||
{ 419, PT_PC, ucp_Nl },
|
||||
{ 422, PT_PC, ucp_No },
|
||||
{ 425, PT_SC, ucp_Ogham },
|
||||
{ 431, PT_SC, ucp_Old_Italic },
|
||||
{ 442, PT_SC, ucp_Old_Persian },
|
||||
{ 454, PT_SC, ucp_Oriya },
|
||||
{ 460, PT_SC, ucp_Osmanya },
|
||||
{ 468, PT_GC, ucp_P },
|
||||
{ 470, PT_PC, ucp_Pc },
|
||||
{ 473, PT_PC, ucp_Pd },
|
||||
{ 476, PT_PC, ucp_Pe },
|
||||
{ 479, PT_PC, ucp_Pf },
|
||||
{ 482, PT_SC, ucp_Phags_Pa },
|
||||
{ 491, PT_SC, ucp_Phoenician },
|
||||
{ 502, PT_PC, ucp_Pi },
|
||||
{ 505, PT_PC, ucp_Po },
|
||||
{ 508, PT_PC, ucp_Ps },
|
||||
{ 511, PT_SC, ucp_Runic },
|
||||
{ 517, PT_GC, ucp_S },
|
||||
{ 519, PT_PC, ucp_Sc },
|
||||
{ 522, PT_SC, ucp_Shavian },
|
||||
{ 530, PT_SC, ucp_Sinhala },
|
||||
{ 538, PT_PC, ucp_Sk },
|
||||
{ 541, PT_PC, ucp_Sm },
|
||||
{ 544, PT_PC, ucp_So },
|
||||
{ 547, PT_SC, ucp_Syloti_Nagri },
|
||||
{ 560, PT_SC, ucp_Syriac },
|
||||
{ 567, PT_SC, ucp_Tagalog },
|
||||
{ 575, PT_SC, ucp_Tagbanwa },
|
||||
{ 584, PT_SC, ucp_Tai_Le },
|
||||
{ 591, PT_SC, ucp_Tamil },
|
||||
{ 597, PT_SC, ucp_Telugu },
|
||||
{ 604, PT_SC, ucp_Thaana },
|
||||
{ 611, PT_SC, ucp_Thai },
|
||||
{ 616, PT_SC, ucp_Tibetan },
|
||||
{ 624, PT_SC, ucp_Tifinagh },
|
||||
{ 633, PT_SC, ucp_Ugaritic },
|
||||
{ 642, PT_SC, ucp_Yi },
|
||||
{ 645, PT_GC, ucp_Z },
|
||||
{ 647, PT_PC, ucp_Zl },
|
||||
{ 650, PT_PC, ucp_Zp },
|
||||
{ 653, PT_PC, ucp_Zs }
|
||||
};
|
||||
|
||||
const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);
|
||||
|
||||
@@ -43,7 +43,7 @@ see if it was compiled with the opposite endianness. If so, it uses an
|
||||
auxiliary local function to flip the appropriate bytes. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -106,6 +106,7 @@ if (byteflip(re->magic_number, sizeof(re->magic_number)) != MAGIC_NUMBER)
|
||||
*internal_re = *re; /* To copy other fields */
|
||||
internal_re->size = byteflip(re->size, sizeof(re->size));
|
||||
internal_re->options = byteflip(re->options, sizeof(re->options));
|
||||
internal_re->flags = (pcre_uint16)byteflip(re->flags, sizeof(re->flags));
|
||||
internal_re->top_bracket =
|
||||
(pcre_uint16)byteflip(re->top_bracket, sizeof(re->top_bracket));
|
||||
internal_re->top_backref =
|
||||
|
||||
@@ -41,7 +41,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* This module contains code for searching the table of Unicode character
|
||||
properties. */
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
strings. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
@@ -60,7 +60,7 @@ an invalid string are then undefined.
|
||||
Originally, this function checked according to RFC 2279, allowing for values in
|
||||
the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were in
|
||||
the canonical format. Once somebody had pointed out RFC 3629 to me (it
|
||||
obsoletes 2279), additional restrictions were applies. The values are now
|
||||
obsoletes 2279), additional restrictions were applied. The values are now
|
||||
limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the
|
||||
subrange 0xd000 to 0xdfff is excluded.
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
string that identifies the PCRE version that is in use. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
@@ -43,7 +43,7 @@ class (one that contains characters whose values are > 255). It is used by both
|
||||
pcre_exec() and pcre_def_exec(). */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
|
||||
+263
-35
@@ -37,7 +37,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
#include <ctype.h>
|
||||
#include <locale.h>
|
||||
@@ -53,7 +53,15 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
#include <unistd.h>
|
||||
#endif
|
||||
|
||||
#include <pcre.h>
|
||||
#ifdef SUPPORT_LIBZ
|
||||
#include <zlib.h>
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
#include <bzlib.h>
|
||||
#endif
|
||||
|
||||
#include "pcre.h"
|
||||
|
||||
#define FALSE 0
|
||||
#define TRUE 1
|
||||
@@ -74,6 +82,10 @@ all values greater than FN_DEFAULT. */
|
||||
|
||||
enum { FN_NONE, FN_DEFAULT, FN_ONLY, FN_NOMATCH_ONLY, FN_FORCE };
|
||||
|
||||
/* File reading styles */
|
||||
|
||||
enum { FR_PLAIN, FR_LIBZ, FR_LIBBZ2 };
|
||||
|
||||
/* Actions for the -d and -D options */
|
||||
|
||||
enum { dee_READ, dee_SKIP, dee_RECURSE };
|
||||
@@ -140,8 +152,10 @@ static int process_options = 0;
|
||||
|
||||
static BOOL count_only = FALSE;
|
||||
static BOOL do_colour = FALSE;
|
||||
static BOOL file_offsets = FALSE;
|
||||
static BOOL hyphenpending = FALSE;
|
||||
static BOOL invert = FALSE;
|
||||
static BOOL line_offsets = FALSE;
|
||||
static BOOL multiline = FALSE;
|
||||
static BOOL number = FALSE;
|
||||
static BOOL only_matching = FALSE;
|
||||
@@ -172,6 +186,8 @@ used to identify them. */
|
||||
#define N_LABEL (-5)
|
||||
#define N_LOCALE (-6)
|
||||
#define N_NULL (-7)
|
||||
#define N_LOFFSETS (-8)
|
||||
#define N_FOFFSETS (-9)
|
||||
|
||||
static option_item optionlist[] = {
|
||||
{ OP_NODATA, N_NULL, NULL, "", " terminate options" },
|
||||
@@ -187,15 +203,17 @@ static option_item optionlist[] = {
|
||||
{ OP_PATLIST, 'e', NULL, "regex(p)", "specify pattern (may be used more than once)" },
|
||||
{ OP_NODATA, 'F', NULL, "fixed_strings", "patterns are sets of newline-separated strings" },
|
||||
{ OP_STRING, 'f', &pattern_filename, "file=path", "read patterns from file" },
|
||||
{ OP_NODATA, N_FOFFSETS, NULL, "file-offsets", "output file offsets, not text" },
|
||||
{ OP_NODATA, 'H', NULL, "with-filename", "force the prefixing filename on output" },
|
||||
{ OP_NODATA, 'h', NULL, "no-filename", "suppress the prefixing filename on output" },
|
||||
{ OP_NODATA, 'i', NULL, "ignore-case", "ignore case distinctions" },
|
||||
{ OP_NODATA, 'l', NULL, "files-with-matches", "print only FILE names containing matches" },
|
||||
{ OP_NODATA, 'L', NULL, "files-without-match","print only FILE names not containing matches" },
|
||||
{ OP_STRING, N_LABEL, &stdin_name, "label=name", "set name for standard input" },
|
||||
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
||||
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
||||
{ OP_NODATA, 'M', NULL, "multiline", "run in multiline mode" },
|
||||
{ OP_STRING, 'N', &newline, "newline=type", "specify newline type (CR, LF, CRLF, ANYCRLF or ANY)" },
|
||||
{ OP_STRING, 'N', &newline, "newline=type", "set newline type (CR, LF, CRLF, ANYCRLF or ANY)" },
|
||||
{ OP_NODATA, 'n', NULL, "line-number", "print line number with output lines" },
|
||||
{ OP_NODATA, 'o', NULL, "only-matching", "show only the part of the line that matched" },
|
||||
{ OP_NODATA, 'q', NULL, "quiet", "suppress output, just set return code" },
|
||||
@@ -314,8 +332,9 @@ return isatty(fileno(stdout));
|
||||
|
||||
/* I (Philip Hazel) have no means of testing this code. It was contributed by
|
||||
Lionel Fourquaux. David Burgess added a patch to define INVALID_FILE_ATTRIBUTES
|
||||
when it did not exist. */
|
||||
|
||||
when it did not exist. David Byron added a patch that moved the #include of
|
||||
<windows.h> to before the INVALID_FILE_ATTRIBUTES definition rather than after.
|
||||
*/
|
||||
|
||||
#elif HAVE_WINDOWS_H
|
||||
|
||||
@@ -325,12 +344,13 @@ when it did not exist. */
|
||||
#ifndef WIN32_LEAN_AND_MEAN
|
||||
# define WIN32_LEAN_AND_MEAN
|
||||
#endif
|
||||
|
||||
#include <windows.h>
|
||||
|
||||
#ifndef INVALID_FILE_ATTRIBUTES
|
||||
#define INVALID_FILE_ATTRIBUTES 0xFFFFFFFF
|
||||
#endif
|
||||
|
||||
#include <windows.h>
|
||||
|
||||
typedef struct directory_type
|
||||
{
|
||||
HANDLE handle;
|
||||
@@ -415,7 +435,7 @@ regular if they are not directories. */
|
||||
|
||||
int isregfile(char *filename)
|
||||
{
|
||||
return !isdirectory(filename)
|
||||
return !isdirectory(filename);
|
||||
}
|
||||
|
||||
|
||||
@@ -426,7 +446,7 @@ return !isdirectory(filename)
|
||||
static BOOL
|
||||
is_stdout_tty(void)
|
||||
{
|
||||
FALSE;
|
||||
return FALSE;
|
||||
}
|
||||
|
||||
|
||||
@@ -802,22 +822,27 @@ be in the middle third most of the time, so the bottom third is available for
|
||||
"before" context printing.
|
||||
|
||||
Arguments:
|
||||
in the fopened FILE stream
|
||||
handle the fopened FILE stream for a normal file
|
||||
the gzFile pointer when reading is via libz
|
||||
the BZFILE pointer when reading is via libbz2
|
||||
frtype FR_PLAIN, FR_LIBZ, or FR_LIBBZ2
|
||||
printname the file name if it is to be printed for each match
|
||||
or NULL if the file name is not to be printed
|
||||
it cannot be NULL if filenames[_nomatch]_only is set
|
||||
|
||||
Returns: 0 if there was at least one match
|
||||
1 otherwise (no matches)
|
||||
2 if there is a read error on a .bz2 file
|
||||
*/
|
||||
|
||||
static int
|
||||
pcregrep(FILE *in, char *printname)
|
||||
pcregrep(void *handle, int frtype, char *printname)
|
||||
{
|
||||
int rc = 1;
|
||||
int linenumber = 1;
|
||||
int lastmatchnumber = 0;
|
||||
int count = 0;
|
||||
int filepos = 0;
|
||||
int offsets[99];
|
||||
char *lastmatchrestart = NULL;
|
||||
char buffer[3*MBUFTHIRD];
|
||||
@@ -825,11 +850,46 @@ char *ptr = buffer;
|
||||
char *endptr;
|
||||
size_t bufflength;
|
||||
BOOL endhyphenpending = FALSE;
|
||||
FILE *in = NULL; /* Ensure initialized */
|
||||
|
||||
/* Do the first read into the start of the buffer and set up the pointer to
|
||||
end of what we have. */
|
||||
#ifdef SUPPORT_LIBZ
|
||||
gzFile ingz = NULL;
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
BZFILE *inbz2 = NULL;
|
||||
#endif
|
||||
|
||||
|
||||
/* Do the first read into the start of the buffer and set up the pointer to end
|
||||
of what we have. In the case of libz, a non-zipped .gz file will be read as a
|
||||
plain file. However, if a .bz2 file isn't actually bzipped, the first read will
|
||||
fail. */
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
if (frtype == FR_LIBZ)
|
||||
{
|
||||
ingz = (gzFile)handle;
|
||||
bufflength = gzread (ingz, buffer, 3*MBUFTHIRD);
|
||||
}
|
||||
else
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
if (frtype == FR_LIBBZ2)
|
||||
{
|
||||
inbz2 = (BZFILE *)handle;
|
||||
bufflength = BZ2_bzread(inbz2, buffer, 3*MBUFTHIRD);
|
||||
if ((int)bufflength < 0) return 2; /* Gotcha: bufflength is size_t; */
|
||||
} /* without the cast it is unsigned. */
|
||||
else
|
||||
#endif
|
||||
|
||||
{
|
||||
in = (FILE *)handle;
|
||||
bufflength = fread(buffer, 1, 3*MBUFTHIRD, in);
|
||||
}
|
||||
|
||||
bufflength = fread(buffer, 1, 3*MBUFTHIRD, in);
|
||||
endptr = buffer + bufflength;
|
||||
|
||||
/* Loop while the current pointer is not at the end of the file. For large
|
||||
@@ -842,6 +902,7 @@ while (ptr < endptr)
|
||||
int i, endlinelength;
|
||||
int mrc = 0;
|
||||
BOOL match = FALSE;
|
||||
char *matchptr = ptr;
|
||||
char *t = ptr;
|
||||
size_t length, linelength;
|
||||
|
||||
@@ -904,13 +965,17 @@ while (ptr < endptr)
|
||||
}
|
||||
#endif
|
||||
|
||||
/* We come back here after a match when the -o option (only_matching) is set,
|
||||
in order to find any further matches in the same line. */
|
||||
|
||||
ONLY_MATCHING_RESTART:
|
||||
|
||||
/* Run through all the patterns until one matches. Note that we don't include
|
||||
the final newline in the subject string. */
|
||||
|
||||
for (i = 0; i < pattern_count; i++)
|
||||
{
|
||||
mrc = pcre_exec(pattern_list[i], hints_list[i], ptr, length, 0, 0,
|
||||
mrc = pcre_exec(pattern_list[i], hints_list[i], matchptr, length, 0, 0,
|
||||
offsets, 99);
|
||||
if (mrc >= 0) { match = TRUE; break; }
|
||||
if (mrc != PCRE_ERROR_NOMATCH)
|
||||
@@ -918,7 +983,7 @@ while (ptr < endptr)
|
||||
fprintf(stderr, "pcregrep: pcre_exec() error %d while matching ", mrc);
|
||||
if (pattern_count > 1) fprintf(stderr, "pattern number %d to ", i+1);
|
||||
fprintf(stderr, "this line:\n");
|
||||
fwrite(ptr, 1, linelength, stderr); /* In case binary zero included */
|
||||
fwrite(matchptr, 1, linelength, stderr); /* In case binary zero included */
|
||||
fprintf(stderr, "\n");
|
||||
if (error_count == 0 &&
|
||||
(mrc == PCRE_ERROR_MATCHLIMIT || mrc == PCRE_ERROR_RECURSIONLIMIT))
|
||||
@@ -965,14 +1030,33 @@ while (ptr < endptr)
|
||||
else if (quiet) return 0;
|
||||
|
||||
/* The --only-matching option prints just the substring that matched, and
|
||||
does not pring any context. */
|
||||
the --file-offsets and --line-offsets options output offsets for the
|
||||
matching substring (they both force --only-matching). None of these options
|
||||
prints any context. Afterwards, adjust the start and length, and then jump
|
||||
back to look for further matches in the same line. If we are in invert
|
||||
mode, however, nothing is printed - this could be still useful because the
|
||||
return code is set. */
|
||||
|
||||
else if (only_matching)
|
||||
{
|
||||
if (printname != NULL) fprintf(stdout, "%s:", printname);
|
||||
if (number) fprintf(stdout, "%d:", linenumber);
|
||||
fwrite(ptr + offsets[0], 1, offsets[1] - offsets[0], stdout);
|
||||
fprintf(stdout, "\n");
|
||||
if (!invert)
|
||||
{
|
||||
if (printname != NULL) fprintf(stdout, "%s:", printname);
|
||||
if (number) fprintf(stdout, "%d:", linenumber);
|
||||
if (line_offsets)
|
||||
fprintf(stdout, "%d,%d", matchptr + offsets[0] - ptr,
|
||||
offsets[1] - offsets[0]);
|
||||
else if (file_offsets)
|
||||
fprintf(stdout, "%d,%d", filepos + matchptr + offsets[0] - ptr,
|
||||
offsets[1] - offsets[0]);
|
||||
else
|
||||
fwrite(matchptr + offsets[0], 1, offsets[1] - offsets[0], stdout);
|
||||
fprintf(stdout, "\n");
|
||||
matchptr += offsets[1];
|
||||
length -= offsets[1];
|
||||
match = FALSE;
|
||||
goto ONLY_MATCHING_RESTART;
|
||||
}
|
||||
}
|
||||
|
||||
/* This is the default case when none of the above options is set. We print
|
||||
@@ -1111,7 +1195,8 @@ while (ptr < endptr)
|
||||
fprintf(stdout, "%c[%sm", 0x1b, colour_string);
|
||||
fwrite(ptr + offsets[0], 1, offsets[1] - offsets[0], stdout);
|
||||
fprintf(stdout, "%c[00m", 0x1b);
|
||||
fwrite(ptr + offsets[1], 1, linelength - offsets[1], stdout);
|
||||
fwrite(ptr + offsets[1], 1, (linelength + endlinelength) - offsets[1],
|
||||
stdout);
|
||||
}
|
||||
else fwrite(ptr, 1, linelength + endlinelength, stdout);
|
||||
}
|
||||
@@ -1145,9 +1230,11 @@ while (ptr < endptr)
|
||||
linelength = endmatch - ptr - ellength;
|
||||
}
|
||||
|
||||
/* Advance to after the newline and increment the line number. */
|
||||
/* Advance to after the newline and increment the line number. The file
|
||||
offset to the current line is maintained in filepos. */
|
||||
|
||||
ptr += linelength + endlinelength;
|
||||
filepos += linelength + endlinelength;
|
||||
linenumber++;
|
||||
|
||||
/* If we haven't yet reached the end of the file (the buffer is full), and
|
||||
@@ -1169,7 +1256,23 @@ while (ptr < endptr)
|
||||
|
||||
memmove(buffer, buffer + MBUFTHIRD, 2*MBUFTHIRD);
|
||||
ptr -= MBUFTHIRD;
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
if (frtype == FR_LIBZ)
|
||||
bufflength = 2*MBUFTHIRD +
|
||||
gzread (ingz, buffer + 2*MBUFTHIRD, MBUFTHIRD);
|
||||
else
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
if (frtype == FR_LIBBZ2)
|
||||
bufflength = 2*MBUFTHIRD +
|
||||
BZ2_bzread(inbz2, buffer + 2*MBUFTHIRD, MBUFTHIRD);
|
||||
else
|
||||
#endif
|
||||
|
||||
bufflength = 2*MBUFTHIRD + fread(buffer + 2*MBUFTHIRD, 1, MBUFTHIRD, in);
|
||||
|
||||
endptr = buffer + bufflength;
|
||||
|
||||
/* Adjust any last match point */
|
||||
@@ -1233,18 +1336,28 @@ grep_or_recurse(char *pathname, BOOL dir_recurse, BOOL only_one_at_top)
|
||||
{
|
||||
int rc = 1;
|
||||
int sep;
|
||||
FILE *in;
|
||||
int frtype;
|
||||
int pathlen;
|
||||
void *handle;
|
||||
FILE *in = NULL; /* Ensure initialized */
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
gzFile ingz = NULL;
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
BZFILE *inbz2 = NULL;
|
||||
#endif
|
||||
|
||||
/* If the file name is "-" we scan stdin */
|
||||
|
||||
if (strcmp(pathname, "-") == 0)
|
||||
{
|
||||
return pcregrep(stdin,
|
||||
return pcregrep(stdin, FR_PLAIN,
|
||||
(filenames > FN_DEFAULT || (filenames == FN_DEFAULT && !only_one_at_top))?
|
||||
stdin_name : NULL);
|
||||
}
|
||||
|
||||
|
||||
/* If the file is a directory, skip if skipping or if we are recursing, scan
|
||||
each file within it, subject to any include or exclude patterns that were set.
|
||||
The scanning code is localized so it can be made system-specific. */
|
||||
@@ -1301,8 +1414,54 @@ skipping was not requested. The scan proceeds. If this is the first and only
|
||||
argument at top level, we don't show the file name, unless we are only showing
|
||||
the file name, or the filename was forced (-H). */
|
||||
|
||||
in = fopen(pathname, "r");
|
||||
if (in == NULL)
|
||||
pathlen = strlen(pathname);
|
||||
|
||||
/* Open using zlib if it is supported and the file name ends with .gz. */
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
if (pathlen > 3 && strcmp(pathname + pathlen - 3, ".gz") == 0)
|
||||
{
|
||||
ingz = gzopen(pathname, "rb");
|
||||
if (ingz == NULL)
|
||||
{
|
||||
if (!silent)
|
||||
fprintf(stderr, "pcregrep: Failed to open %s: %s\n", pathname,
|
||||
strerror(errno));
|
||||
return 2;
|
||||
}
|
||||
handle = (void *)ingz;
|
||||
frtype = FR_LIBZ;
|
||||
}
|
||||
else
|
||||
#endif
|
||||
|
||||
/* Otherwise open with bz2lib if it is supported and the name ends with .bz2. */
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
if (pathlen > 4 && strcmp(pathname + pathlen - 4, ".bz2") == 0)
|
||||
{
|
||||
inbz2 = BZ2_bzopen(pathname, "rb");
|
||||
handle = (void *)inbz2;
|
||||
frtype = FR_LIBBZ2;
|
||||
}
|
||||
else
|
||||
#endif
|
||||
|
||||
/* Otherwise use plain fopen(). The label is so that we can come back here if
|
||||
an attempt to read a .bz2 file indicates that it really is a plain file. */
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
PLAIN_FILE:
|
||||
#endif
|
||||
{
|
||||
in = fopen(pathname, "r");
|
||||
handle = (void *)in;
|
||||
frtype = FR_PLAIN;
|
||||
}
|
||||
|
||||
/* All the opening methods return errno when they fail. */
|
||||
|
||||
if (handle == NULL)
|
||||
{
|
||||
if (!silent)
|
||||
fprintf(stderr, "pcregrep: Failed to open %s: %s\n", pathname,
|
||||
@@ -1310,10 +1469,50 @@ if (in == NULL)
|
||||
return 2;
|
||||
}
|
||||
|
||||
rc = pcregrep(in, (filenames > FN_DEFAULT ||
|
||||
/* Now grep the file */
|
||||
|
||||
rc = pcregrep(handle, frtype, (filenames > FN_DEFAULT ||
|
||||
(filenames == FN_DEFAULT && !only_one_at_top))? pathname : NULL);
|
||||
|
||||
/* Close in an appropriate manner. */
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
if (frtype == FR_LIBZ)
|
||||
gzclose(ingz);
|
||||
else
|
||||
#endif
|
||||
|
||||
/* If it is a .bz2 file and the result is 2, it means that the first attempt to
|
||||
read failed. If the error indicates that the file isn't in fact bzipped, try
|
||||
again as a normal file. */
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
if (frtype == FR_LIBBZ2)
|
||||
{
|
||||
if (rc == 2)
|
||||
{
|
||||
int errnum;
|
||||
const char *err = BZ2_bzerror(inbz2, &errnum);
|
||||
if (errnum == BZ_DATA_ERROR_MAGIC)
|
||||
{
|
||||
BZ2_bzclose(inbz2);
|
||||
goto PLAIN_FILE;
|
||||
}
|
||||
else if (!silent)
|
||||
fprintf(stderr, "pcregrep: Failed to read %s using bzlib: %s\n",
|
||||
pathname, err);
|
||||
}
|
||||
BZ2_bzclose(inbz2);
|
||||
}
|
||||
else
|
||||
#endif
|
||||
|
||||
/* Normal file close */
|
||||
|
||||
fclose(in);
|
||||
|
||||
/* Pass back the yield from pcregrep(). */
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
@@ -1334,7 +1533,8 @@ for (op = optionlist; op->one_char != 0; op++)
|
||||
if (op->one_char > 0) fprintf(stderr, "%c", op->one_char);
|
||||
}
|
||||
fprintf(stderr, "] [long options] [pattern] [files]\n");
|
||||
fprintf(stderr, "Type `pcregrep --help' for more information.\n");
|
||||
fprintf(stderr, "Type `pcregrep --help' for more information and the long "
|
||||
"options.\n");
|
||||
return rc;
|
||||
}
|
||||
|
||||
@@ -1353,9 +1553,23 @@ option_item *op;
|
||||
printf("Usage: pcregrep [OPTION]... [PATTERN] [FILE1 FILE2 ...]\n");
|
||||
printf("Search for PATTERN in each FILE or standard input.\n");
|
||||
printf("PATTERN must be present if neither -e nor -f is used.\n");
|
||||
printf("\"-\" can be used as a file name to mean STDIN.\n\n");
|
||||
printf("Example: pcregrep -i 'hello.*world' menu.h main.c\n\n");
|
||||
printf("\"-\" can be used as a file name to mean STDIN.\n");
|
||||
|
||||
#ifdef SUPPORT_LIBZ
|
||||
printf("Files whose names end in .gz are read using zlib.\n");
|
||||
#endif
|
||||
|
||||
#ifdef SUPPORT_LIBBZ2
|
||||
printf("Files whose names end in .bz2 are read using bzlib2.\n");
|
||||
#endif
|
||||
|
||||
#if defined SUPPORT_LIBZ || defined SUPPORT_LIBBZ2
|
||||
printf("Other files and the standard input are read as plain files.\n\n");
|
||||
#else
|
||||
printf("All files are read as plain files, without any interpretation.\n\n");
|
||||
#endif
|
||||
|
||||
printf("Example: pcregrep -i 'hello.*world' menu.h main.c\n\n");
|
||||
printf("Options:\n");
|
||||
|
||||
for (op = optionlist; op->one_char != 0; op++)
|
||||
@@ -1363,8 +1577,7 @@ for (op = optionlist; op->one_char != 0; op++)
|
||||
int n;
|
||||
char s[4];
|
||||
if (op->one_char > 0) sprintf(s, "-%c,", op->one_char); else strcpy(s, " ");
|
||||
printf(" %s --%s%n", s, op->long_name, &n);
|
||||
n = 30 - n;
|
||||
n = 30 - printf(" %s --%s", s, op->long_name);
|
||||
if (n < 1) n = 1;
|
||||
printf("%.*s%s\n", n, " ", op->help_text);
|
||||
}
|
||||
@@ -1389,7 +1602,9 @@ handle_option(int letter, int options)
|
||||
{
|
||||
switch(letter)
|
||||
{
|
||||
case N_FOFFSETS: file_offsets = TRUE; break;
|
||||
case N_HELP: help(); exit(0);
|
||||
case N_LOFFSETS: line_offsets = number = TRUE; break;
|
||||
case 'c': count_only = TRUE; break;
|
||||
case 'F': process_options |= PO_FIXED_STRINGS; break;
|
||||
case 'H': filenames = FN_FORCE; break;
|
||||
@@ -1826,6 +2041,19 @@ if (both_context > 0)
|
||||
if (before_context == 0) before_context = both_context;
|
||||
}
|
||||
|
||||
/* Only one of --only-matching, --file-offsets, or --line-offsets is permitted.
|
||||
However, the latter two set the only_matching flag. */
|
||||
|
||||
if ((only_matching && (file_offsets || line_offsets)) ||
|
||||
(file_offsets && line_offsets))
|
||||
{
|
||||
fprintf(stderr, "pcregrep: Cannot mix --only-matching, --file-offsets "
|
||||
"and/or --line-offsets\n");
|
||||
exit(usage(2));
|
||||
}
|
||||
|
||||
if (file_offsets || line_offsets) only_matching = TRUE;
|
||||
|
||||
/* If a locale has not been provided as an option, see if the LC_CTYPE or
|
||||
LC_ALL environment variable is set, and if so, use it. */
|
||||
|
||||
@@ -2063,7 +2291,7 @@ if (include_pattern != NULL)
|
||||
|
||||
if (i >= argc)
|
||||
{
|
||||
rc = pcregrep(stdin, (filenames > FN_DEFAULT)? stdin_name : NULL);
|
||||
rc = pcregrep(stdin, FR_PLAIN, (filenames > FN_DEFAULT)? stdin_name : NULL);
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
functions. */
|
||||
|
||||
|
||||
#include <config.h>
|
||||
#include "config.h"
|
||||
|
||||
|
||||
/* Ensure that the PCREPOSIX_EXP_xxx macros are set appropriately for
|
||||
@@ -55,12 +55,11 @@ previously been set. */
|
||||
# define PCREPOSIX_EXP_DEFN __declspec(dllexport)
|
||||
#endif
|
||||
|
||||
#include <pcre.h>
|
||||
#include "pcre.h"
|
||||
#include "pcre_internal.h"
|
||||
#include "pcreposix.h"
|
||||
|
||||
|
||||
|
||||
/* Table to translate PCRE compile time error codes into POSIX error codes. */
|
||||
|
||||
static const int eint[] = {
|
||||
@@ -123,7 +122,9 @@ static const int eint[] = {
|
||||
REG_INVARG, /* inconsistent NEWLINE options */
|
||||
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
|
||||
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
|
||||
REG_BADPAT /* number is too big */
|
||||
REG_BADPAT, /* number is too big */
|
||||
REG_BADPAT, /* subpattern name expected */
|
||||
REG_BADPAT /* digit expected after (?+ */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
|
||||
Vendored
+17
@@ -358,10 +358,13 @@ after the binary zero
|
||||
./testdata/grepinput:597:after the binary zero
|
||||
---------------------------- Test 42 ------------------------------
|
||||
595:before
|
||||
595:zero
|
||||
596:zero
|
||||
597:after
|
||||
597:zero
|
||||
---------------------------- Test 43 ------------------------------
|
||||
595:before
|
||||
595:zero
|
||||
596:zero
|
||||
597:zero
|
||||
---------------------------- Test 44 ------------------------------
|
||||
@@ -385,3 +388,17 @@ PUT NEW DATA ABOVE THIS LINE.
|
||||
---------------------------- Test 49 ------------------------------
|
||||
---------------------------- Test 50 ------------------------------
|
||||
over the lazy dog.
|
||||
---------------------------- Test 51 ------------------------------
|
||||
fox [1;31mjumps[00m
|
||||
---------------------------- Test 52 ------------------------------
|
||||
36972,6
|
||||
36990,4
|
||||
37024,4
|
||||
37066,5
|
||||
37083,4
|
||||
---------------------------- Test 53 ------------------------------
|
||||
595:15,6
|
||||
595:33,4
|
||||
596:28,4
|
||||
597:15,5
|
||||
597:32,4
|
||||
|
||||
Vendored
+11
-5
@@ -3421,11 +3421,6 @@
|
||||
/((?m)^b)/
|
||||
a\nb\nc\n
|
||||
|
||||
/(?(1)a|b)/
|
||||
|
||||
/(?(1)b|a)/
|
||||
a
|
||||
|
||||
/(x)?(?(1)a|b)/
|
||||
*** Failers
|
||||
a
|
||||
@@ -4030,4 +4025,15 @@
|
||||
/( (?(1)0|)* )/x
|
||||
abcd
|
||||
|
||||
/[[:abcd:xyz]]/
|
||||
a]
|
||||
:]
|
||||
|
||||
/[abc[:x\]pqr]/
|
||||
a
|
||||
[
|
||||
:
|
||||
]
|
||||
p
|
||||
|
||||
/ End of testinput1 /
|
||||
|
||||
Vendored
+143
-18
@@ -398,8 +398,6 @@
|
||||
|
||||
/(?(1?)a|b)/
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
|
||||
/[a[:xyz:/
|
||||
|
||||
/(?<=x+)y/
|
||||
@@ -568,15 +566,15 @@
|
||||
|
||||
/ab\d+/I
|
||||
|
||||
/a(?(1)b)/I
|
||||
/a(?(1)b)(.)/I
|
||||
|
||||
/a(?(1)bag|big)/I
|
||||
/a(?(1)bag|big)(.)/I
|
||||
|
||||
/a(?(1)bag|big)*/I
|
||||
/a(?(1)bag|big)*(.)/I
|
||||
|
||||
/a(?(1)bag|big)+/I
|
||||
/a(?(1)bag|big)+(.)/I
|
||||
|
||||
/a(?(1)b..|b..)/I
|
||||
/a(?(1)b..|b..)(.)/I
|
||||
|
||||
/ab\d{0}e/I
|
||||
|
||||
@@ -977,13 +975,13 @@
|
||||
|
||||
/()a/I
|
||||
|
||||
/(?(1)ab|ac)/I
|
||||
/(?(1)ab|ac)(.)/I
|
||||
|
||||
/(?(1)abz|acz)/I
|
||||
/(?(1)abz|acz)(.)/I
|
||||
|
||||
/(?(1)abz)/I
|
||||
/(?(1)abz)(.)/I
|
||||
|
||||
/(?(1)abz)123/I
|
||||
/(?(1)abz)(1)23/I
|
||||
|
||||
/(a)+/I
|
||||
|
||||
@@ -1999,7 +1997,7 @@ a random value. /Ix
|
||||
|
||||
/a/<any><crlf>
|
||||
|
||||
/^a\Rb/
|
||||
/^a\Rb/<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -2009,7 +2007,7 @@ a random value. /Ix
|
||||
** Failers
|
||||
a\n\rb
|
||||
|
||||
/^a\R*b/
|
||||
/^a\R*b/<bsr_unicode>
|
||||
ab
|
||||
a\nb
|
||||
a\rb
|
||||
@@ -2020,7 +2018,7 @@ a random value. /Ix
|
||||
a\n\rb
|
||||
a\n\r\x85\x0cb
|
||||
|
||||
/^a\R+b/
|
||||
/^a\R+b/<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -2032,7 +2030,7 @@ a random value. /Ix
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/^a\R{1,3}b/
|
||||
/^a\R{1,3}b/<bsr_unicode>
|
||||
a\nb
|
||||
a\n\rb
|
||||
a\n\r\x85b
|
||||
@@ -2044,7 +2042,7 @@ a random value. /Ix
|
||||
a\n\n\n\rb
|
||||
a\r
|
||||
|
||||
/^a[\R]b/
|
||||
/^a[\R]b/<bsr_unicode>
|
||||
aRb
|
||||
** Failers
|
||||
a\nb
|
||||
@@ -2190,8 +2188,8 @@ a random value. /Ix
|
||||
|
||||
/((?(-2)a))/BZ
|
||||
|
||||
/^(?(+1)X|Y)/BZ
|
||||
Y
|
||||
/^(?(+1)X|Y)(.)/BZ
|
||||
Y!
|
||||
|
||||
/(foo)\Kbar/
|
||||
foobar
|
||||
@@ -2464,4 +2462,131 @@ a random value. /Ix
|
||||
a\r\nb
|
||||
a\x85b
|
||||
|
||||
/a\Rb/I<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x85b
|
||||
a\x0bb
|
||||
|
||||
/a\Rb/I<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x85b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x85b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R?b/I<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x85b
|
||||
a\x0bb
|
||||
|
||||
/a\R?b/I<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x85b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x85b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R{2,4}b/I<bsr_anycrlf>
|
||||
a\r\n\nb
|
||||
a\n\r\rb
|
||||
a\r\n\r\n\r\n\r\nb
|
||||
** Failers
|
||||
a\x85\85b
|
||||
a\x0b\0bb
|
||||
|
||||
/a\R{2,4}b/I<bsr_unicode>
|
||||
a\r\rb
|
||||
a\n\n\nb
|
||||
a\r\n\n\r\rb
|
||||
a\x85\85b
|
||||
a\x0b\0bb
|
||||
** Failers
|
||||
a\r\r\r\r\rb
|
||||
a\x85\85b\<bsr_anycrlf>
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
|
||||
/(*BSR_ANYCRLF)a\Rb/I
|
||||
a\nb
|
||||
a\rb
|
||||
|
||||
/(*BSR_UNICODE)a\Rb/I
|
||||
a\x85b
|
||||
|
||||
/(*BSR_ANYCRLF)(*CRLF)a\Rb/I
|
||||
a\nb
|
||||
a\rb
|
||||
|
||||
/(*CRLF)(*BSR_UNICODE)a\Rb/I
|
||||
a\x85b
|
||||
|
||||
/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I
|
||||
|
||||
/(?<a>)(?&)/
|
||||
|
||||
/(?<abc>)(?&a)/
|
||||
|
||||
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
|
||||
|
||||
/(?+-a)/
|
||||
|
||||
/(?-+a)/
|
||||
|
||||
/(?(-1))/
|
||||
|
||||
/(?(+10))/
|
||||
|
||||
/(?(10))/
|
||||
|
||||
/(?(+2))()()/
|
||||
|
||||
/(?(2))()()/
|
||||
|
||||
/\k''/
|
||||
|
||||
/\k<>/
|
||||
|
||||
/\k{}/
|
||||
|
||||
/(?P=)/
|
||||
|
||||
/(?P>)/
|
||||
|
||||
/(?!\w)(?R)/
|
||||
|
||||
/(?=\w)(?R)/
|
||||
|
||||
/(?<!\w)(?R)/
|
||||
|
||||
/(?<=\w)(?R)/
|
||||
|
||||
/[[:foo:]]/
|
||||
|
||||
/[[:1234:]]/
|
||||
|
||||
/[[:f\oo:]]/
|
||||
|
||||
/[[: :]]/
|
||||
|
||||
/[[:...:]]/
|
||||
|
||||
/[[:l\ower:]]/
|
||||
|
||||
/[[:abc\:]]/
|
||||
|
||||
/[abc[:x\]pqr:]]/
|
||||
|
||||
/[[:a\dz:]]/
|
||||
|
||||
/ End of testinput2 /
|
||||
|
||||
Vendored
+72
@@ -535,4 +535,76 @@
|
||||
/\W{2}/8g
|
||||
+\x{a3}==
|
||||
|
||||
/\S/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/[\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/\D/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/[\D]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/\W/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
|
||||
/[\W]/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
|
||||
/[\S\s]*/8
|
||||
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
|
||||
|
||||
/[\x{41f}\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/.[^\S]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
|
||||
/.[^\S\n]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
|
||||
/[[:^alnum:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^alpha:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^ascii:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^blank:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^cntrl:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^digit:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^graph:]]/8g
|
||||
\x19\x{e01ff}
|
||||
|
||||
/[[:^lower:]]/8g
|
||||
A\x{422}
|
||||
|
||||
/[[:^print:]]/8g
|
||||
\x{19}\x{e01ff}
|
||||
|
||||
/[[:^punct:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^space:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^upper:]]/8g
|
||||
a\x{442}
|
||||
|
||||
/[[:^word:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^xdigit:]]/8g
|
||||
M\x{442}
|
||||
|
||||
/ End of testinput4 /
|
||||
|
||||
Vendored
+48
-4
@@ -312,7 +312,7 @@ can't tell the difference.) --/
|
||||
/abc.$/mgx8<any>
|
||||
abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9
|
||||
|
||||
/^a\Rb/8
|
||||
/^a\Rb/8<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -324,7 +324,7 @@ can't tell the difference.) --/
|
||||
** Failers
|
||||
a\n\rb
|
||||
|
||||
/^a\R*b/8
|
||||
/^a\R*b/8<bsr_unicode>
|
||||
ab
|
||||
a\nb
|
||||
a\rb
|
||||
@@ -335,7 +335,7 @@ can't tell the difference.) --/
|
||||
a\n\rb
|
||||
a\n\r\x{85}\x0cb
|
||||
|
||||
/^a\R+b/8
|
||||
/^a\R+b/8<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -347,7 +347,7 @@ can't tell the difference.) --/
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/^a\R{1,3}b/8
|
||||
/^a\R{1,3}b/8<bsr_unicode>
|
||||
a\nb
|
||||
a\n\rb
|
||||
a\n\r\x{85}b
|
||||
@@ -417,4 +417,48 @@ can't tell the difference.) --/
|
||||
\x{7fffffff}
|
||||
\x{7fffffff}\?
|
||||
|
||||
/a\Rb/I8<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
|
||||
/a\Rb/I8<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R?b/I8<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
|
||||
/a\R?b/I8<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/.*a.*=.b.*/8<ANY>
|
||||
QQQ\x{2029}ABCaXYZ=!bPQR
|
||||
** Failers
|
||||
a\x{2029}b
|
||||
\x61\xe2\x80\xa9\x62
|
||||
|
||||
/[[:a\x{100}b:]]/8
|
||||
|
||||
/ End of testinput5 /
|
||||
|
||||
Vendored
+75
@@ -832,4 +832,79 @@ was broken in all cases./
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
\x{656}
|
||||
\x{657}
|
||||
\x{658}
|
||||
\x{659}
|
||||
\x{65a}
|
||||
\x{65b}
|
||||
\x{65c}
|
||||
\x{65d}
|
||||
\x{65e}
|
||||
\x{66a}
|
||||
\x{6e9}
|
||||
\x{6ef}
|
||||
\x{6fa}
|
||||
** Failers
|
||||
\x{600}
|
||||
\x{650}
|
||||
\x{651}
|
||||
\x{652}
|
||||
\x{653}
|
||||
\x{654}
|
||||
\x{655}
|
||||
\x{65f}
|
||||
|
||||
/^\p{Cyrillic}/8
|
||||
\x{1d2b}
|
||||
|
||||
/^\p{Common}/8
|
||||
\x{589}
|
||||
\x{60c}
|
||||
\x{61f}
|
||||
\x{964}
|
||||
\x{965}
|
||||
\x{970}
|
||||
|
||||
/^\p{Inherited}/8
|
||||
\x{64b}
|
||||
\x{654}
|
||||
\x{655}
|
||||
\x{200c}
|
||||
** Failers
|
||||
\x{64a}
|
||||
\x{656}
|
||||
|
||||
/^\p{Shavian}/8
|
||||
\x{10450}
|
||||
\x{1047f}
|
||||
|
||||
/^\p{Deseret}/8
|
||||
\x{10400}
|
||||
\x{1044f}
|
||||
|
||||
/^\p{Osmanya}/8
|
||||
\x{10480}
|
||||
\x{1049d}
|
||||
\x{104a0}
|
||||
\x{104a9}
|
||||
** Failers
|
||||
\x{1049e}
|
||||
\x{1049f}
|
||||
\x{104aa}
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
|
||||
/ End of testinput6 /
|
||||
|
||||
Vendored
+60
-5
@@ -4156,7 +4156,7 @@
|
||||
/abc.$/mgx<any>
|
||||
abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
|
||||
|
||||
/^a\Rb/
|
||||
/^a\Rb/<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -4166,7 +4166,7 @@
|
||||
** Failers
|
||||
a\n\rb
|
||||
|
||||
/^a\R*b/
|
||||
/^a\R*b/<bsr_unicode>
|
||||
ab
|
||||
a\nb
|
||||
a\rb
|
||||
@@ -4177,7 +4177,7 @@
|
||||
a\n\rb
|
||||
a\n\r\x85\x0cb
|
||||
|
||||
/^a\R+b/
|
||||
/^a\R+b/<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -4189,7 +4189,7 @@
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/^a\R{1,3}b/
|
||||
/^a\R{1,3}b/<bsr_unicode>
|
||||
a\nb
|
||||
a\n\rb
|
||||
a\n\r\x85b
|
||||
@@ -4201,7 +4201,7 @@
|
||||
a\n\n\n\rb
|
||||
a\r
|
||||
|
||||
/^a[\R]b/
|
||||
/^a[\R]b/<bsr_unicode>
|
||||
aRb
|
||||
** Failers
|
||||
a\nb
|
||||
@@ -4310,4 +4310,59 @@
|
||||
/(\r|\n)A/<crlf>
|
||||
\r\nA
|
||||
|
||||
/a\Rb/I<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x85b
|
||||
a\x0bb
|
||||
|
||||
/a\Rb/I<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x85b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x85b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R?b/I<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x85b
|
||||
a\x0bb
|
||||
|
||||
/a\R?b/I<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x85b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x85b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R{2,4}b/I<bsr_anycrlf>
|
||||
a\r\n\nb
|
||||
a\n\r\rb
|
||||
a\r\n\r\n\r\n\r\nb
|
||||
** Failers
|
||||
a\x85\85b
|
||||
a\x0b\0bb
|
||||
|
||||
/a\R{2,4}b/I<bsr_unicode>
|
||||
a\r\rb
|
||||
a\n\n\nb
|
||||
a\r\n\n\r\rb
|
||||
a\x85\85b
|
||||
a\x0b\0bb
|
||||
** Failers
|
||||
a\r\r\r\r\rb
|
||||
a\x85\85b\<bsr_anycrlf>
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
|
||||
/ End of testinput7 /
|
||||
|
||||
Vendored
+40
-4
@@ -543,7 +543,7 @@
|
||||
/abc.$/mgx8<any>
|
||||
abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9
|
||||
|
||||
/^a\Rb/8
|
||||
/^a\Rb/8<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -555,7 +555,7 @@
|
||||
** Failers
|
||||
a\n\rb
|
||||
|
||||
/^a\R*b/8
|
||||
/^a\R*b/8<bsr_unicode>
|
||||
ab
|
||||
a\nb
|
||||
a\rb
|
||||
@@ -566,7 +566,7 @@
|
||||
a\n\rb
|
||||
a\n\r\x{85}\x0cb
|
||||
|
||||
/^a\R+b/8
|
||||
/^a\R+b/8<bsr_unicode>
|
||||
a\nb
|
||||
a\rb
|
||||
a\r\nb
|
||||
@@ -578,7 +578,7 @@
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/^a\R{1,3}b/8
|
||||
/^a\R{1,3}b/8<bsr_unicode>
|
||||
a\nb
|
||||
a\n\rb
|
||||
a\n\r\x{85}b
|
||||
@@ -628,4 +628,40 @@
|
||||
** Failers
|
||||
\x09\x{200a}\x{a0}\x{2028}\x0b
|
||||
|
||||
/a\Rb/I8<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
|
||||
/a\Rb/I8<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/a\R?b/I8<bsr_anycrlf>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
** Failers
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
|
||||
/a\R?b/I8<bsr_unicode>
|
||||
a\rb
|
||||
a\nb
|
||||
a\r\nb
|
||||
a\x{85}b
|
||||
a\x0bb
|
||||
** Failers
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/ End of testinput 8 /
|
||||
|
||||
+18
-6
@@ -5551,12 +5551,6 @@ No match
|
||||
0: b
|
||||
1: b
|
||||
|
||||
/(?(1)a|b)/
|
||||
|
||||
/(?(1)b|a)/
|
||||
a
|
||||
0: a
|
||||
|
||||
/(x)?(?(1)a|b)/
|
||||
*** Failers
|
||||
No match
|
||||
@@ -6593,4 +6587,22 @@ No match
|
||||
0:
|
||||
1:
|
||||
|
||||
/[[:abcd:xyz]]/
|
||||
a]
|
||||
0: a]
|
||||
:]
|
||||
0: :]
|
||||
|
||||
/[abc[:x\]pqr]/
|
||||
a
|
||||
0: a
|
||||
[
|
||||
0: [
|
||||
:
|
||||
0: :
|
||||
]
|
||||
0: ]
|
||||
p
|
||||
0: p
|
||||
|
||||
/ End of testinput1 /
|
||||
|
||||
+297
-80
@@ -109,7 +109,7 @@ Failed: missing ) at offset 4
|
||||
Failed: missing ) after comment at offset 7
|
||||
|
||||
/(?z)abc/
|
||||
Failed: unrecognized character after (? at offset 2
|
||||
Failed: unrecognized character after (? or (?- at offset 2
|
||||
|
||||
/.*b/I
|
||||
Capturing subpattern count = 0
|
||||
@@ -166,7 +166,6 @@ Starting byte set: a b c d
|
||||
|
||||
/(a|[^\dZ])/IS
|
||||
Capturing subpattern count = 1
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -311,7 +310,7 @@ No match
|
||||
No match
|
||||
|
||||
/ab(?z)cd/
|
||||
Failed: unrecognized character after (? at offset 4
|
||||
Failed: unrecognized character after (? or (?- at offset 4
|
||||
|
||||
/^abc|def/I
|
||||
Capturing subpattern count = 0
|
||||
@@ -403,7 +402,6 @@ Failed: missing terminating ] for character class at offset 4
|
||||
/[^aeiou ]{3,}/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -948,26 +946,23 @@ Failed: missing ) at offset 4
|
||||
Failed: unrecognized character after (?< at offset 3
|
||||
|
||||
/a(?{)b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{{})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{}})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{"{"})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{"{"}})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/(?(1?)a|b)/
|
||||
Failed: malformed number or name after (?( at offset 4
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
Failed: conditional group contains more than two branches at offset 10
|
||||
|
||||
/[a[:xyz:/
|
||||
Failed: missing terminating ] for character class at offset 8
|
||||
|
||||
@@ -1440,7 +1435,6 @@ Need char = 'a'
|
||||
/"([^\\"]+|\\.)*"/I
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
First char = '"'
|
||||
Need char = '"'
|
||||
@@ -1602,32 +1596,32 @@ No options
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
|
||||
/a(?(1)b)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)b)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/a(?(1)bag|big)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'g'
|
||||
|
||||
/a(?(1)bag|big)*/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)*(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/a(?(1)bag|big)+/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)+(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'g'
|
||||
|
||||
/a(?(1)b..|b..)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)b..|b..)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
@@ -1716,7 +1710,6 @@ Study returned NULL
|
||||
/Ix
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1746,7 +1739,6 @@ No match
|
||||
/\( ( (?>[^()]+) | (?R) )* \) /Ixg
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1762,7 +1754,6 @@ Need char = ')'
|
||||
/\( (?: (?>[^()]+) | (?R) ) \) /Ix
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1782,7 +1773,6 @@ No match
|
||||
/\( (?: (?>[^()]+) | (?R) )? \) /Ix
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1794,7 +1784,6 @@ Need char = ')'
|
||||
/\( ( (?>[^()]+) | (?R) )* \) /Ix
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1805,7 +1794,6 @@ Need char = ')'
|
||||
/\( ( ( (?>[^()]+) | (?R) )* ) \) /Ix
|
||||
Capturing subpattern count = 2
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1817,7 +1805,6 @@ Need char = ')'
|
||||
/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /Ix
|
||||
Capturing subpattern count = 3
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1835,7 +1822,6 @@ Need char = ')'
|
||||
/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /Ix
|
||||
Capturing subpattern count = 3
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1853,7 +1839,6 @@ Need char = ')'
|
||||
/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /Ix
|
||||
Capturing subpattern count = 11
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1874,7 +1859,6 @@ Need char = ')'
|
||||
/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /Ix
|
||||
Capturing subpattern count = 3
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1887,7 +1871,6 @@ Need char = ')'
|
||||
/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /Ix
|
||||
Capturing subpattern count = 3
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -1919,12 +1902,11 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-/:-@[-`{-\xff]
|
||||
[\x00-/:-@[-`{-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -1946,12 +1928,11 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-@[-`{-\xff]
|
||||
[\x00-@[-`{-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -1973,7 +1954,6 @@ Starting byte set: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -1982,7 +1962,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x80-\xff]
|
||||
[\x80-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@@ -2008,12 +1988,11 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-\x08\x0a-\x1f!-\xff]
|
||||
[\x00-\x08\x0a-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -2035,7 +2014,6 @@ Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -2114,7 +2092,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -2162,7 +2139,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[ -~\x80-\xff]
|
||||
[ -~\x80-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@@ -2175,12 +2152,11 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-/12:-\xff]
|
||||
[\x00-/12:-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -2189,12 +2165,11 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-\x08\x0a-\x1f!-\xff]
|
||||
[\x00-\x08\x0a-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -2758,7 +2733,7 @@ No need char
|
||||
/[\S]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\x00-\x08\x0b\x0e-\x1f!-\xff]
|
||||
[\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@@ -3083,7 +3058,6 @@ Need char = 'b'
|
||||
/([^()]++|\([^()]*\))+/I
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -3094,7 +3068,6 @@ No need char
|
||||
/\(([^()]++|\([^()]+\))+\)/I
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
First char = '('
|
||||
Need char = ')'
|
||||
@@ -3295,7 +3268,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -3308,7 +3280,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -3316,7 +3287,6 @@ No need char
|
||||
/< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/Ix
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '<'
|
||||
Need char = '>'
|
||||
@@ -3468,26 +3438,26 @@ No options
|
||||
No first char
|
||||
Need char = 'a'
|
||||
|
||||
/(?(1)ab|ac)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)ab|ac)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/(?(1)abz|acz)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz|acz)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'z'
|
||||
|
||||
/(?(1)abz)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/(?(1)abz)123/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz)(1)23/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
Need char = '3'
|
||||
@@ -3531,7 +3501,6 @@ Starting byte set: a b
|
||||
|
||||
/[^a]/I
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -3991,7 +3960,6 @@ Failed: recursive call could loop indefinitely at offset 16
|
||||
|
||||
/^([^()]|\((?1)*\))*$/I
|
||||
Capturing subpattern count = 1
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
@@ -4011,7 +3979,6 @@ No match
|
||||
|
||||
/^>abc>([^()]|\((?1)*\))*<xyz<$/I
|
||||
Capturing subpattern count = 1
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored
|
||||
No first char
|
||||
Need char = '<'
|
||||
@@ -4139,7 +4106,6 @@ No match
|
||||
/((< (?: (?(R) \d++ | [^<>]*+) | (?2)) * >))/Ix
|
||||
Capturing subpattern count = 2
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
First char = '<'
|
||||
Need char = '>'
|
||||
@@ -5958,7 +5924,6 @@ Matched, but too many substrings
|
||||
/[^()]*(?:\((?R)\)[^()]*)*/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -5972,7 +5937,6 @@ No need char
|
||||
/[^()]*(?:\((?>(?R))\)[^()]*)*/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -5984,7 +5948,6 @@ No need char
|
||||
/[^()]*(?:\((?R)\))*[^()]*/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -5996,7 +5959,6 @@ No need char
|
||||
/(?:\((?R)\))*[^()]*/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -6010,7 +5972,6 @@ No need char
|
||||
/(?:\((?R)\))|[^()]*/I
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -6205,6 +6166,7 @@ Named capturing subpatterns:
|
||||
A 2
|
||||
A 3
|
||||
Options: anchored dupnames
|
||||
Duplicate name status changes
|
||||
No first char
|
||||
No need char
|
||||
a1b\CA
|
||||
@@ -7913,7 +7875,7 @@ No match
|
||||
/a/<any><crlf>
|
||||
Failed: inconsistent NEWLINE options at offset 0
|
||||
|
||||
/^a\Rb/
|
||||
/^a\Rb/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
@@ -7931,7 +7893,7 @@ No match
|
||||
a\n\rb
|
||||
No match
|
||||
|
||||
/^a\R*b/
|
||||
/^a\R*b/<bsr_unicode>
|
||||
ab
|
||||
0: ab
|
||||
a\nb
|
||||
@@ -7951,7 +7913,7 @@ No match
|
||||
a\n\r\x85\x0cb
|
||||
0: a\x0a\x0d\x85\x0cb
|
||||
|
||||
/^a\R+b/
|
||||
/^a\R+b/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
@@ -7973,7 +7935,7 @@ No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/^a\R{1,3}b/
|
||||
/^a\R{1,3}b/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\n\rb
|
||||
@@ -7995,7 +7957,7 @@ No match
|
||||
a\r
|
||||
No match
|
||||
|
||||
/^a[\R]b/
|
||||
/^a[\R]b/<bsr_unicode>
|
||||
aRb
|
||||
0: aRb
|
||||
** Failers
|
||||
@@ -8343,7 +8305,7 @@ Failed: reference to non-existent subpattern at offset 6
|
||||
/((?(-2)a))/BZ
|
||||
Failed: reference to non-existent subpattern at offset 7
|
||||
|
||||
/^(?(+1)X|Y)/BZ
|
||||
/^(?(+1)X|Y)(.)/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
@@ -8353,11 +8315,15 @@ Failed: reference to non-existent subpattern at offset 7
|
||||
Alt
|
||||
Y
|
||||
Ket
|
||||
CBra 1
|
||||
Any
|
||||
Ket
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Y
|
||||
0: Y
|
||||
Y!
|
||||
0: Y!
|
||||
1: !
|
||||
|
||||
/(foo)\Kbar/
|
||||
foobar
|
||||
@@ -9168,4 +9134,255 @@ No match
|
||||
a\x85b
|
||||
No match
|
||||
|
||||
/a\Rb/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\Rb/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
a\x0bb
|
||||
0: a\x0bb
|
||||
** Failers
|
||||
No match
|
||||
a\x85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R?b/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\R?b/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
a\x0bb
|
||||
0: a\x0bb
|
||||
** Failers
|
||||
No match
|
||||
a\x85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\r\n\nb
|
||||
0: a\x0d\x0a\x0ab
|
||||
a\n\r\rb
|
||||
0: a\x0a\x0d\x0db
|
||||
a\r\n\r\n\r\n\r\nb
|
||||
0: a\x0d\x0a\x0d\x0a\x0d\x0a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85\85b
|
||||
No match
|
||||
a\x0b\0bb
|
||||
No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\r\rb
|
||||
0: a\x0d\x0db
|
||||
a\n\n\nb
|
||||
0: a\x0a\x0a\x0ab
|
||||
a\r\n\n\r\rb
|
||||
0: a\x0d\x0a\x0a\x0d\x0db
|
||||
a\x85\85b
|
||||
No match
|
||||
a\x0b\0bb
|
||||
No match
|
||||
** Failers
|
||||
No match
|
||||
a\r\r\r\r\rb
|
||||
No match
|
||||
a\x85\85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/(*BSR_ANYCRLF)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
0: a\x0db
|
||||
|
||||
/(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
|
||||
/(*BSR_ANYCRLF)(*CRLF)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
Forced newline sequence: CRLF
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
0: a\x0db
|
||||
|
||||
/(*CRLF)(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
Forced newline sequence: CRLF
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
|
||||
/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
Forced newline sequence: CR
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
|
||||
/(?<a>)(?&)/
|
||||
Failed: subpattern name expected at offset 9
|
||||
|
||||
/(?<abc>)(?&a)/
|
||||
Failed: reference to non-existent subpattern at offset 12
|
||||
|
||||
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
|
||||
Failed: reference to non-existent subpattern at offset 32
|
||||
|
||||
/(?+-a)/
|
||||
Failed: digit expected after (?+ at offset 3
|
||||
|
||||
/(?-+a)/
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/(?(-1))/
|
||||
Failed: reference to non-existent subpattern at offset 6
|
||||
|
||||
/(?(+10))/
|
||||
Failed: reference to non-existent subpattern at offset 7
|
||||
|
||||
/(?(10))/
|
||||
Failed: reference to non-existent subpattern at offset 6
|
||||
|
||||
/(?(+2))()()/
|
||||
|
||||
/(?(2))()()/
|
||||
|
||||
/\k''/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/\k<>/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/\k{}/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/(?P=)/
|
||||
Failed: subpattern name expected at offset 4
|
||||
|
||||
/(?P>)/
|
||||
Failed: subpattern name expected at offset 4
|
||||
|
||||
/(?!\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 9
|
||||
|
||||
/(?=\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 9
|
||||
|
||||
/(?<!\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 10
|
||||
|
||||
/(?<=\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 10
|
||||
|
||||
/[[:foo:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:1234:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:f\oo:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[: :]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:...:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:l\ower:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:abc\:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[abc[:x\]pqr:]]/
|
||||
Failed: unknown POSIX class name at offset 6
|
||||
|
||||
/[[:a\dz:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/ End of testinput2 /
|
||||
|
||||
+131
@@ -938,4 +938,135 @@ No match
|
||||
0: +\x{a3}
|
||||
0: ==
|
||||
|
||||
/\S/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/[\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/\D/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/[\D]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/\W/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
0: \x{2442}
|
||||
0: \x{2435}
|
||||
0: \x{2441}
|
||||
0: \x{2442}
|
||||
|
||||
/[\W]/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
0: \x{2442}
|
||||
0: \x{2435}
|
||||
0: \x{2441}
|
||||
0: \x{2442}
|
||||
|
||||
/[\S\s]*/8
|
||||
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
|
||||
0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz
|
||||
|
||||
/[\x{41f}\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/.[^\S]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
0: c d
|
||||
0: z\x{0a}p
|
||||
|
||||
/.[^\S\n]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
0: c d
|
||||
|
||||
/[[:^alnum:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^alpha:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^ascii:]]/8g
|
||||
A\x{442}
|
||||
0: \x{442}
|
||||
|
||||
/[[:^blank:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^cntrl:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^digit:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^graph:]]/8g
|
||||
\x19\x{e01ff}
|
||||
0: \x{19}
|
||||
0: \x{e01ff}
|
||||
|
||||
/[[:^lower:]]/8g
|
||||
A\x{422}
|
||||
0: A
|
||||
0: \x{422}
|
||||
|
||||
/[[:^print:]]/8g
|
||||
\x{19}\x{e01ff}
|
||||
0: \x{19}
|
||||
0: \x{e01ff}
|
||||
|
||||
/[[:^punct:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^space:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^upper:]]/8g
|
||||
a\x{442}
|
||||
0: a
|
||||
0: \x{442}
|
||||
|
||||
/[[:^word:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^xdigit:]]/8g
|
||||
M\x{442}
|
||||
0: M
|
||||
0: \x{442}
|
||||
|
||||
/ End of testinput4 /
|
||||
|
||||
+97
-11
@@ -364,7 +364,6 @@ No match
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: anchored utf8
|
||||
No first char
|
||||
No need char
|
||||
@@ -387,7 +386,6 @@ No match
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@@ -655,7 +653,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -668,7 +665,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@@ -792,7 +788,6 @@ Need char = 191
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
@@ -805,7 +800,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@@ -942,7 +936,6 @@ Need char = 'z'
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Contains explicit CR or LF match
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'z'
|
||||
@@ -1314,7 +1307,7 @@ Failed: missing terminating ] for character class at offset 15
|
||||
0: abc8
|
||||
0: abc9
|
||||
|
||||
/^a\Rb/8
|
||||
/^a\Rb/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\rb
|
||||
@@ -1336,7 +1329,7 @@ No match
|
||||
a\n\rb
|
||||
No match
|
||||
|
||||
/^a\R*b/8
|
||||
/^a\R*b/8<bsr_unicode>
|
||||
ab
|
||||
0: ab
|
||||
a\nb
|
||||
@@ -1356,7 +1349,7 @@ No match
|
||||
a\n\r\x{85}\x0cb
|
||||
0: a\x{0a}\x{0d}\x{85}\x{0c}b
|
||||
|
||||
/^a\R+b/8
|
||||
/^a\R+b/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\rb
|
||||
@@ -1378,7 +1371,7 @@ No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/^a\R{1,3}b/8
|
||||
/^a\R{1,3}b/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\n\rb
|
||||
@@ -1522,4 +1515,97 @@ Error -10
|
||||
\x{7fffffff}\?
|
||||
No match
|
||||
|
||||
/a\Rb/I8<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\Rb/I8<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
a\x{85}b
|
||||
0: a\x{85}b
|
||||
a\x0bb
|
||||
0: a\x{0b}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R?b/I8<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\R?b/I8<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
a\x{85}b
|
||||
0: a\x{85}b
|
||||
a\x0bb
|
||||
0: a\x{0b}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/.*a.*=.b.*/8<ANY>
|
||||
QQQ\x{2029}ABCaXYZ=!bPQR
|
||||
0: ABCaXYZ=!bPQR
|
||||
** Failers
|
||||
No match
|
||||
a\x{2029}b
|
||||
No match
|
||||
\x61\xe2\x80\xa9\x62
|
||||
No match
|
||||
|
||||
/[[:a\x{100}b:]]/8
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/ End of testinput5 /
|
||||
|
||||
+157
@@ -1522,4 +1522,161 @@ No match
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
0: \x{60e}
|
||||
\x{656}
|
||||
0: \x{656}
|
||||
\x{657}
|
||||
0: \x{657}
|
||||
\x{658}
|
||||
0: \x{658}
|
||||
\x{659}
|
||||
0: \x{659}
|
||||
\x{65a}
|
||||
0: \x{65a}
|
||||
\x{65b}
|
||||
0: \x{65b}
|
||||
\x{65c}
|
||||
0: \x{65c}
|
||||
\x{65d}
|
||||
0: \x{65d}
|
||||
\x{65e}
|
||||
0: \x{65e}
|
||||
\x{66a}
|
||||
0: \x{66a}
|
||||
\x{6e9}
|
||||
0: \x{6e9}
|
||||
\x{6ef}
|
||||
0: \x{6ef}
|
||||
\x{6fa}
|
||||
0: \x{6fa}
|
||||
** Failers
|
||||
No match
|
||||
\x{600}
|
||||
No match
|
||||
\x{650}
|
||||
No match
|
||||
\x{651}
|
||||
No match
|
||||
\x{652}
|
||||
No match
|
||||
\x{653}
|
||||
No match
|
||||
\x{654}
|
||||
No match
|
||||
\x{655}
|
||||
No match
|
||||
\x{65f}
|
||||
No match
|
||||
|
||||
/^\p{Cyrillic}/8
|
||||
\x{1d2b}
|
||||
0: \x{1d2b}
|
||||
|
||||
/^\p{Common}/8
|
||||
\x{589}
|
||||
0: \x{589}
|
||||
\x{60c}
|
||||
0: \x{60c}
|
||||
\x{61f}
|
||||
0: \x{61f}
|
||||
\x{964}
|
||||
0: \x{964}
|
||||
\x{965}
|
||||
0: \x{965}
|
||||
\x{970}
|
||||
0: \x{970}
|
||||
|
||||
/^\p{Inherited}/8
|
||||
\x{64b}
|
||||
0: \x{64b}
|
||||
\x{654}
|
||||
0: \x{654}
|
||||
\x{655}
|
||||
0: \x{655}
|
||||
\x{200c}
|
||||
0: \x{200c}
|
||||
** Failers
|
||||
No match
|
||||
\x{64a}
|
||||
No match
|
||||
\x{656}
|
||||
No match
|
||||
|
||||
/^\p{Shavian}/8
|
||||
\x{10450}
|
||||
0: \x{10450}
|
||||
\x{1047f}
|
||||
0: \x{1047f}
|
||||
|
||||
/^\p{Deseret}/8
|
||||
\x{10400}
|
||||
0: \x{10400}
|
||||
\x{1044f}
|
||||
0: \x{1044f}
|
||||
|
||||
/^\p{Osmanya}/8
|
||||
\x{10480}
|
||||
0: \x{10480}
|
||||
\x{1049d}
|
||||
0: \x{1049d}
|
||||
\x{104a0}
|
||||
0: \x{104a0}
|
||||
\x{104a9}
|
||||
0: \x{104a9}
|
||||
** Failers
|
||||
No match
|
||||
\x{1049e}
|
||||
No match
|
||||
\x{1049f}
|
||||
No match
|
||||
\x{104aa}
|
||||
No match
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl {2}
|
||||
prop Zl ?+
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
0: \x{2028}\x{2028}
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
0: \x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Lu {3}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop L {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Cc {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/ End of testinput6 /
|
||||
|
||||
+129
-5
@@ -6824,7 +6824,7 @@ No match
|
||||
0: abc6
|
||||
0: abc9
|
||||
|
||||
/^a\Rb/
|
||||
/^a\Rb/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
@@ -6842,7 +6842,7 @@ No match
|
||||
a\n\rb
|
||||
No match
|
||||
|
||||
/^a\R*b/
|
||||
/^a\R*b/<bsr_unicode>
|
||||
ab
|
||||
0: ab
|
||||
a\nb
|
||||
@@ -6862,7 +6862,7 @@ No match
|
||||
a\n\r\x85\x0cb
|
||||
0: a\x0a\x0d\x85\x0cb
|
||||
|
||||
/^a\R+b/
|
||||
/^a\R+b/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\rb
|
||||
@@ -6884,7 +6884,7 @@ No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/^a\R{1,3}b/
|
||||
/^a\R{1,3}b/<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\n\rb
|
||||
@@ -6906,7 +6906,7 @@ No match
|
||||
a\r
|
||||
No match
|
||||
|
||||
/^a[\R]b/
|
||||
/^a[\R]b/<bsr_unicode>
|
||||
aRb
|
||||
0: aRb
|
||||
** Failers
|
||||
@@ -7088,4 +7088,128 @@ No match
|
||||
\r\nA
|
||||
0: \x0aA
|
||||
|
||||
/a\Rb/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\Rb/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
a\x0bb
|
||||
0: a\x0bb
|
||||
** Failers
|
||||
No match
|
||||
a\x85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R?b/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\R?b/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x0db
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
a\r\nb
|
||||
0: a\x0d\x0ab
|
||||
a\x85b
|
||||
0: a\x85b
|
||||
a\x0bb
|
||||
0: a\x0bb
|
||||
** Failers
|
||||
No match
|
||||
a\x85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\r\n\nb
|
||||
0: a\x0d\x0a\x0ab
|
||||
a\n\r\rb
|
||||
0: a\x0a\x0d\x0db
|
||||
a\r\n\r\n\r\n\r\nb
|
||||
0: a\x0d\x0a\x0d\x0a\x0d\x0a\x0d\x0ab
|
||||
** Failers
|
||||
No match
|
||||
a\x85\85b
|
||||
No match
|
||||
a\x0b\0bb
|
||||
No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\r\rb
|
||||
0: a\x0d\x0db
|
||||
a\n\n\nb
|
||||
0: a\x0a\x0a\x0ab
|
||||
a\r\n\n\r\rb
|
||||
0: a\x0d\x0a\x0a\x0d\x0db
|
||||
a\x85\85b
|
||||
No match
|
||||
a\x0b\0bb
|
||||
No match
|
||||
** Failers
|
||||
No match
|
||||
a\r\r\r\r\rb
|
||||
No match
|
||||
a\x85\85b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/ End of testinput7 /
|
||||
|
||||
+84
-4
@@ -1052,7 +1052,7 @@ No match
|
||||
0: abc8
|
||||
0: abc9
|
||||
|
||||
/^a\Rb/8
|
||||
/^a\Rb/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\rb
|
||||
@@ -1074,7 +1074,7 @@ No match
|
||||
a\n\rb
|
||||
No match
|
||||
|
||||
/^a\R*b/8
|
||||
/^a\R*b/8<bsr_unicode>
|
||||
ab
|
||||
0: ab
|
||||
a\nb
|
||||
@@ -1094,7 +1094,7 @@ No match
|
||||
a\n\r\x{85}\x0cb
|
||||
0: a\x{0a}\x{0d}\x{85}\x{0c}b
|
||||
|
||||
/^a\R+b/8
|
||||
/^a\R+b/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\rb
|
||||
@@ -1116,7 +1116,7 @@ No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/^a\R{1,3}b/8
|
||||
/^a\R{1,3}b/8<bsr_unicode>
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\n\rb
|
||||
@@ -1204,4 +1204,84 @@ No match
|
||||
\x09\x{200a}\x{a0}\x{2028}\x0b
|
||||
No match
|
||||
|
||||
/a\Rb/I8<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\Rb/I8<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
a\x{85}b
|
||||
0: a\x{85}b
|
||||
a\x0bb
|
||||
0: a\x{0b}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a\R?b/I8<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_anycrlf utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b
|
||||
No match
|
||||
a\x0bb
|
||||
No match
|
||||
|
||||
/a\R?b/I8<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Options: bsr_unicode utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
a\rb
|
||||
0: a\x{0d}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
a\r\nb
|
||||
0: a\x{0d}\x{0a}b
|
||||
a\x{85}b
|
||||
0: a\x{85}b
|
||||
a\x0bb
|
||||
0: a\x{0b}b
|
||||
** Failers
|
||||
No match
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/ End of testinput 8 /
|
||||
|
||||
+35
-15
@@ -539,7 +539,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x21000293, 0x14000000 },
|
||||
{ 0x21000294, 0x1c000000 },
|
||||
{ 0x21800295, 0x1400001a },
|
||||
{ 0x218002b0, 0x18000011 },
|
||||
{ 0x218002b0, 0x18000008 },
|
||||
{ 0x098002b9, 0x18000008 },
|
||||
{ 0x098002c2, 0x60000003 },
|
||||
{ 0x098002c6, 0x1800000b },
|
||||
{ 0x098002d2, 0x6000000d },
|
||||
@@ -1039,15 +1040,18 @@ static const cnode ucp_table[] = {
|
||||
{ 0x198005f3, 0x54000001 },
|
||||
{ 0x09800600, 0x04000003 },
|
||||
{ 0x0000060b, 0x5c000000 },
|
||||
{ 0x0980060c, 0x54000001 },
|
||||
{ 0x0900060c, 0x54000000 },
|
||||
{ 0x0000060d, 0x54000000 },
|
||||
{ 0x0080060e, 0x68000001 },
|
||||
{ 0x00800610, 0x30000005 },
|
||||
{ 0x0900061b, 0x54000000 },
|
||||
{ 0x0080061e, 0x54000001 },
|
||||
{ 0x0000061e, 0x54000000 },
|
||||
{ 0x0900061f, 0x54000000 },
|
||||
{ 0x00800621, 0x1c000019 },
|
||||
{ 0x09000640, 0x18000000 },
|
||||
{ 0x00800641, 0x1c000009 },
|
||||
{ 0x1b80064b, 0x30000013 },
|
||||
{ 0x1b80064b, 0x3000000a },
|
||||
{ 0x00800656, 0x30000008 },
|
||||
{ 0x09800660, 0x34000009 },
|
||||
{ 0x0080066a, 0x54000003 },
|
||||
{ 0x0080066e, 0x1c000001 },
|
||||
@@ -1074,7 +1078,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x31000711, 0x30000000 },
|
||||
{ 0x31800712, 0x1c00001d },
|
||||
{ 0x31800730, 0x3000001a },
|
||||
{ 0x3180074d, 0x1c000020 },
|
||||
{ 0x3180074d, 0x1c000002 },
|
||||
{ 0x00800750, 0x1c00001d },
|
||||
{ 0x37800780, 0x1c000025 },
|
||||
{ 0x378007a6, 0x3000000a },
|
||||
{ 0x370007b1, 0x1c000000 },
|
||||
@@ -1460,7 +1465,10 @@ static const cnode ucp_table[] = {
|
||||
{ 0x1f0017dd, 0x30000000 },
|
||||
{ 0x1f8017e0, 0x34000009 },
|
||||
{ 0x1f8017f0, 0x3c000009 },
|
||||
{ 0x25801800, 0x54000005 },
|
||||
{ 0x25801800, 0x54000001 },
|
||||
{ 0x09801802, 0x54000001 },
|
||||
{ 0x25001804, 0x54000000 },
|
||||
{ 0x09001805, 0x54000000 },
|
||||
{ 0x25001806, 0x44000000 },
|
||||
{ 0x25801807, 0x54000003 },
|
||||
{ 0x2580180b, 0x30000002 },
|
||||
@@ -1513,14 +1521,20 @@ static const cnode ucp_table[] = {
|
||||
{ 0x3d801b61, 0x68000009 },
|
||||
{ 0x3d801b6b, 0x30000008 },
|
||||
{ 0x3d801b74, 0x68000008 },
|
||||
{ 0x21801d00, 0x1400002b },
|
||||
{ 0x21801d2c, 0x18000035 },
|
||||
{ 0x21801d62, 0x14000015 },
|
||||
{ 0x21801d00, 0x14000025 },
|
||||
{ 0x13801d26, 0x14000004 },
|
||||
{ 0x0c001d2b, 0x14000000 },
|
||||
{ 0x21801d2c, 0x18000030 },
|
||||
{ 0x13801d5d, 0x18000004 },
|
||||
{ 0x21801d62, 0x14000003 },
|
||||
{ 0x13801d66, 0x14000004 },
|
||||
{ 0x21801d6b, 0x1400000c },
|
||||
{ 0x0c001d78, 0x18000000 },
|
||||
{ 0x21801d79, 0x14000003 },
|
||||
{ 0x21001d7d, 0x14000ee6 },
|
||||
{ 0x21801d7e, 0x1400001c },
|
||||
{ 0x21801d9b, 0x18000024 },
|
||||
{ 0x21801d9b, 0x18000023 },
|
||||
{ 0x13001dbf, 0x18000000 },
|
||||
{ 0x1b801dc0, 0x3000000a },
|
||||
{ 0x1b801dfe, 0x30000001 },
|
||||
{ 0x21001e00, 0x24000001 },
|
||||
@@ -1982,7 +1996,9 @@ static const cnode ucp_table[] = {
|
||||
{ 0x13001ffc, 0x2000fff7 },
|
||||
{ 0x13801ffd, 0x60000001 },
|
||||
{ 0x09802000, 0x7400000a },
|
||||
{ 0x0980200b, 0x04000004 },
|
||||
{ 0x0900200b, 0x04000000 },
|
||||
{ 0x1b80200c, 0x04000001 },
|
||||
{ 0x0980200e, 0x04000001 },
|
||||
{ 0x09802010, 0x44000005 },
|
||||
{ 0x09802016, 0x54000001 },
|
||||
{ 0x09002018, 0x50000000 },
|
||||
@@ -2615,7 +2631,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x090030a0, 0x44000000 },
|
||||
{ 0x1d8030a1, 0x1c000059 },
|
||||
{ 0x090030fb, 0x54000000 },
|
||||
{ 0x098030fc, 0x18000002 },
|
||||
{ 0x090030fc, 0x18000000 },
|
||||
{ 0x1d8030fd, 0x18000001 },
|
||||
{ 0x1d0030ff, 0x1c000000 },
|
||||
{ 0x03803105, 0x1c000027 },
|
||||
{ 0x17803131, 0x1c00005d },
|
||||
@@ -2630,7 +2647,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x0980322a, 0x68000019 },
|
||||
{ 0x09003250, 0x68000000 },
|
||||
{ 0x09803251, 0x3c00000e },
|
||||
{ 0x17803260, 0x6800001f },
|
||||
{ 0x17803260, 0x6800001d },
|
||||
{ 0x0980327e, 0x68000001 },
|
||||
{ 0x09803280, 0x3c000009 },
|
||||
{ 0x0980328a, 0x68000026 },
|
||||
{ 0x098032b1, 0x3c00000e },
|
||||
@@ -2678,7 +2696,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x1900fb3e, 0x1c000000 },
|
||||
{ 0x1980fb40, 0x1c000001 },
|
||||
{ 0x1980fb43, 0x1c000001 },
|
||||
{ 0x1980fb46, 0x1c00006b },
|
||||
{ 0x1980fb46, 0x1c000009 },
|
||||
{ 0x0080fb50, 0x1c000061 },
|
||||
{ 0x0080fbd3, 0x1c00016a },
|
||||
{ 0x0900fd3e, 0x58000000 },
|
||||
{ 0x0900fd3f, 0x48000000 },
|
||||
@@ -2944,7 +2963,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x0d01044d, 0x1400ffd8 },
|
||||
{ 0x0d01044e, 0x1400ffd8 },
|
||||
{ 0x0d01044f, 0x1400ffd8 },
|
||||
{ 0x2e810450, 0x1c00004d },
|
||||
{ 0x2e810450, 0x1c00002f },
|
||||
{ 0x2c810480, 0x1c00001d },
|
||||
{ 0x2c8104a0, 0x34000009 },
|
||||
{ 0x0b810800, 0x1c000005 },
|
||||
{ 0x0b010808, 0x1c000000 },
|
||||
|
||||
@@ -37,9 +37,9 @@ function recurse($path)
|
||||
|
||||
if ($file[0] === '.' ||
|
||||
$file === 'CVS' ||
|
||||
substr_compare($file, '.lo', -3, 3) == 0 ||
|
||||
substr_compare($file, '.loT', -4, 4) == 0 ||
|
||||
substr_compare($file, '.o', -2, 2) == 0) continue;
|
||||
@substr_compare($file, '.lo', -3, 3) === 0 ||
|
||||
@substr_compare($file, '.loT', -4, 4) === 0 ||
|
||||
@substr_compare($file, '.o', -2, 2) === 0) continue;
|
||||
|
||||
$file = "$path/$file";
|
||||
|
||||
|
||||
Reference in New Issue
Block a user