Adding two `zend_long`s may overflow, and casting `size_t` to
`zend_long` may truncate; we can avoid this here by enforcing unsigned
arithmetic.
Closes GH-7240.
`tolower()` returns an `int`, so we must not convert to `char` which
may be `signed` and as such may be subject to overflow (actually,
implementation defined behavior).
Closes GH-6007
When normalizing tags to check whether they are contained in the set
of allowable tags, we must not strip slashes, unless they come
immediately after the opening `<`, or immediately before the closing
`>`.
When the strip tags state machine has been flattened, an if statement
has mistakenly been treated as else if. We fix this, and also simplify
a bit right away.
On this benchmark:
function simple_string_escape() {
$a = "test'asd'asd'' asd\'\"asdfasdf";
for($i=0; $i<512; $i++) {
$a .= chr($i%256);
}
for ($i = 0; $i < 100000; $i++) {
if ($a === stripslashes(addslashes($a)))
$a .= chr($i%256);
else {
echo "error at i=".$i."\n";
return;
}
}
}
the execution time goes from 21.619s to 8.139s (165% speedup) on an A1 Graviton instance.
When removing the characters that need escaping, i.e., this benchmark:
function simple_string() {
$a = "testasdasd asdasdfasdf";
for ($i = 0; $i < 10000; $i++) {
if ($a === stripslashes(addslashes($a)))
$a .= "test dedeasdf";
else {
echo "error at i=".$i."\n";
return;
}
}
}
the execution time goes from 2.932s down to 0.516s (468% speedup) on an A1 Graviton instance.
The strcoll function is defined in the C89 standard and should be
on today's systems always available via the <string.h> header.
https://port70.net/~nsz/c/c89/c89-draft.html#4.11.4.3
- Remove also SKIPIF strcoll check in test
On some recent Windows systems, ext\pcre\tests\locales.phpt fails,
because 'pt_PT' is accepted by `setlocale()`, but not properly
supported by the ctype functions, which are used internally by PCRE2 to
build the localized character tables.
Since there appears to be no way to properly check whether a given
locale is fully supported, but we want to minimize BC impact, we filter
out typical Unix locale names, except for a few cases which have
already been properly supported on Windows. This way code like
setlocale(LC_ALL, 'de_DE.UTF-8', 'de_DE', 'German_Germany.1252');
should work like on older Windows systems.
It should be noted that the locale names causing trouble are not (yet)
documented as valid names anyway, see
<https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=vs-2019>.
RFC: https://wiki.php.net/rfc/tostring_exceptions
And convert some object to string conversion related recoverable
fatal errors into Error exceptions.
Improve exception safety of internal code performing string
conversions.
The execution time goes from 4.388s down to 0.563s on a Graviton A1
instance for the benchmark:
function reverse_strings() {
$a = "foo";
for ($i = 0; $i < 100000; $i++) {
strrev($a);
$a .= "o";
}
}
Checking for the strerror function presence is no longer needed since it
is part of the C89 standard [1] and can be safely assumed that all
current systems have it.
Check in the configure.ac and Windows defined symbol are still left
until the file library (libmagic) will be updated.
[1]: https://port70.net/~nsz/c/c89/c89-draft.html
The `<loccale.h>` header file, setlocale, and localeconv are part of the
standard C89 [1] and on current systems can be used unconditionally.
Since PHP 7.4 requires at least C89 or greater, the `HAVE_LOCALE_H`,
`HAVE_SETLOCALE`, and `HAVE_LOCALECONV` symbols defined by Autoconf in
configure.ac [2] can be ommitted and simplifed.
The bundled libmagic (file) has also been patched already in version
5.35 and up in upstream location so when it will be patched also in
php-src the check for locale.h header is still left in the configure.ac
and in windows headers definition file.
[1] https://port70.net/~nsz/c/c89/c89-draft.html#4.4
[2] https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4
Omit the bundled libmagic files