1
0
mirror of https://github.com/php/php-src.git synced 2026-04-22 07:28:09 +02:00
Commit Graph

130660 Commits

Author SHA1 Message Date
Alex Dowad 7f44559516 mb_str{i,}pos does not match illegal byte sequences against occurrences of mb_substitute_char
In GitHub issue 9613, it was reported that mb_strpos wrongly matches the
character '?' against any invalid string, even when the character '?'
clearly does not appear in the invalid string. This behavior has existed
at least since PHP 5.2.

The reason for the behavior is that mb_strpos internally converts the
haystack and needle to UTF-8 before performing a search. When converting
to UTF-8, regardless of the setting of mb_substitute_character, libmbfl
would use '?' as an error marker for invalid byte sequences. Once those
invalid input sequences were replaced with '?', then naturally, they
would match against occurrences of the actual character '?' (when it
appeared as a 'normal' character, not as an error marker). This would
happen regardless of whether the error was in the haystack and '?' was
used in the needle, or whether the error was in the needle and '?' was
used in the haystack.

Why would libmbfl use '?' rather than the mb_substitute_character set
by the user? Remember that libmbfl was originally a separate library
which was imported into the PHP codebase. mb_substitute_character is an
mbstring API function, not something built into libmbfl. When mbstring
would call into libmbfl, it would provide the error replacement
character to libmbfl as a parameter. However, when libmbfl would perform
conversion operations internally, and not because of a direct call from
mbstring, it would use its own error replacement character.

Example:

    <?php
    $questionMark = "\x00?";
    $badUTF16 = "\xDB\x00"; // half of a surrogate pair
    echo mb_strpos($questionMark, $badUTF16, 0, 'UTF-16BE'), "\n";
    echo mb_strpos($badUTF16, $questionMark, 0, 'UTF-16BE'), "\n";

Incidentally, this behavior does not occur if the text encoding is
UTF-8, because no conversion is needed in that case.

mb_stripos had a similar issue, but instead of always using '?' as an
error marker internally, it would use the selected
mb_substitute_character. So, for example, if the mb_substitute_character
was '%', then occurrences of '%' in the haystack would match invalid
bytes in the needle, and vice versa.

Example:

    <?php
    mb_substitute_character(0x25); // '%'
    $percent = "\x00%";
    $badUTF16 = "\xDB\x00"; // half of a surrogate pair
    echo mb_stripos($percent, $badUTF16, 0, 'UTF-16BE'), "\n";
    echo mb_stripos($badUTF16, $percent, 0, 'UTF-16BE'), "\n";

This behavior (of mb_stripos) still occurs even if the text encoding is
UTF-8, because case folding is still needed to make the search
case-insensitive.

It is not hard to think of scenarios where these strange and unintuitive
behaviors could cause security vulnerabilities. In the discussion on
GH issue 9613, Christoph Becker suggested that mb_str{i,}pos should
simply refuse to operate on invalid strings. However, this would almost
certainly break existing production code.

This commit mitigates the problem in a less intrusive way: it ensures
that while invalid haystacks can match invalid needles (even if the
specific invalid bytes are different), invalid bytes in the haystack
will never match '?' OR occurrences of the mb_substitute_character in
the needle, and vice versa.

This does represent a backwards compatibility break, but a small one.
Since it mitigates a potential security problem, I believe this is
appropriate.

Closes GH-9613.
2022-12-18 15:31:20 +02:00
Alex Dowad 744ca16e73 Speed boost for mb_stripos (when not using UTF-8)
Instead of case-folding a string and then converting it to UTF-8 as a
separate operation, why not convert it to UTF-8 at the same time as
we fold case?

For non-UTF-8 encodings, this typically makes mb_stripos about 2x
faster.
2022-12-18 15:31:20 +02:00
Niels e288438373 Remove unnecessary check of p in phpdbg_trim (#10122)
The check checks whether p is non-NULL. But if it were NULL the function
would crash in later code, so the check is useless.
It seems like *p was intended, but that is redundant as well because
isspace would return false on '\0'.
2022-12-18 03:19:10 +01:00
Ilija Tovilo 6d9d2eb355 Optimize JMP[N]Z_EX to BOOL instead of QM_ASSIGN (#10108)
&& and || should always evaluate to a boolean instead of the lhs/rhs.

This optimization never gets triggered for any of our tests.
Additionally, even if triggered this instruction gets optimized away
because the else branch of the JMP instruction will overwrite the tmp
value.
2022-12-17 12:47:02 +01:00
Arnaud Le Blanc 027add9e1b [ci skip] UPGRADING 2022-12-16 18:14:22 +01:00
Arnaud Le Blanc 0ff4a9accd [ci skip] UPGRADING 2022-12-16 18:12:28 +01:00
Arnaud Le Blanc a11c8a3039 Limit stack size (#9104) 2022-12-16 17:44:26 +01:00
Máté Kocsis dc54e04ed4 Merge branch 'PHP-8.2'
* PHP-8.2:
  Only include the default constructor for non-abstract class synopses
2022-12-16 17:03:22 +01:00
Máté Kocsis d832125b8e Only include the default constructor for non-abstract class synopses 2022-12-16 17:02:35 +01:00
Christoph M. Becker 416420b362 [ci skip] Remove duplicated NEWS entry 2022-12-16 14:45:00 +01:00
Christoph M. Becker cea0fc04d1 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create()
2022-12-16 14:38:09 +01:00
Christoph M. Becker 018fbd0a68 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create()
2022-12-16 14:37:39 +01:00
Christoph M. Becker b8ac2071b8 Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create()
There is no `ldap_create()`, but rather `ldap_connect()`.

Closes GH-10115.
2022-12-16 14:36:30 +01:00
Máté Kocsis 8afc55870e Merge branch 'PHP-8.2'
* PHP-8.2:
  Replace another root XML element format to the "canonical" one
  Remove the superfluous closing parentheses from class synopsis page includes
  Always include the constructor on the class manual pages
  Backport methodsynopsis role attributes changes from master
2022-12-16 13:21:39 +01:00
Máté Kocsis 6aa5e58414 Backport methodsynopsis role attributes changes from master
Commits https://github.com/php/php-src/commit/93605f286d11876da44d2ecd41c13d7e3f0aae66 and https://github.com/php/php-src/commit/d6651426f405342f74cdfe930448912ef68e23c4
2022-12-16 13:18:12 +01:00
Máté Kocsis 0fc60fab72 Always include the constructor on the class manual pages 2022-12-16 13:18:12 +01:00
Máté Kocsis b4df038cee Remove the superfluous closing parentheses from class synopsis page includes 2022-12-16 13:18:12 +01:00
Máté Kocsis 60cf9fbee0 Replace another root XML element format to the "canonical" one 2022-12-16 13:18:12 +01:00
Alex Dowad b9cd1cdb4f Implement mb_substr_count using fast text conversion filters
The performance gain from this change depends on the text encoding and
input string size. For very small strings, other overheads tend to swamp
the performance gains to some extent, such that the speedup is less than
2x. For medium-length strings (~100 bytes or so), the speedup is
typically around 2.5x.

The greatest performance gains are for UTF-8 strings which have already
been marked as valid (using the GC flags on the zend_string object);
for those, the speedup is more than 10x in many cases.

The previous implementation first converted the haystack and needle to
wchars, then searched for matches between the two sequences of wchars.
Because we use -1 as an error marker when converting to wchars, error
markers from invalid byte sequences in the haystack would match error
markers from invalid byte sequences in the needle, even if the specific
invalid byte sequence was different. I am not sure whether this behavior
is really desirable or not, but anyways, this new implementation
follows the same behavior so as not to cause BC breaks.
2022-12-15 07:54:26 +02:00
Tim Düsterhus f9a1a90380 Add Randomizer::nextFloat() and Randomizer::getFloat() (#9679)
* random: Add Randomizer::nextFloat()

* random: Check that doubles are IEEE-754 in Randomizer::nextFloat()

* random: Add Randomizer::nextFloat() tests

* random: Add Randomizer::getFloat() implementing the y-section algorithm

The algorithm is published in:

Drawing Random Floating-Point Numbers from an Interval. Frédéric
Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022.
https://doi.org/10.1145/3503512

* random: Implement getFloat_gamma() optimization

see https://github.com/php/php-src/pull/9679/files#r994668327

* random: Add Random\IntervalBoundary

* random: Split the implementation of γ-section into its own file

* random: Add tests for Randomizer::getFloat()

* random: Fix γ-section for 32-bit systems

* random: Replace check for __STDC_IEC_559__ by compile-time check for DBL_MANT_DIG

* random: Drop nextFloat_spacing.phpt

* random: Optimize Randomizer::getFloat() implementation

* random: Reject non-finite parameters in Randomizer::getFloat()

* random: Add NEWS/UPGRADING for Randomizer’s float functionality
2022-12-14 17:48:47 +01:00
Tim Düsterhus 284f61ee22 [ci skip] Fix typo in unserialize() function name in NEWS
see dd8de1e726
2022-12-14 17:43:43 +01:00
Pierrick Charron 2f119c3008 Merge branch 'PHP-8.2'
* PHP-8.2:
  PHP-8.2 is now for PHP 8.2.2-dev
2022-12-13 19:31:11 -05:00
Pierrick Charron 002d54db9f PHP-8.2 is now for PHP 8.2.2-dev 2022-12-13 19:29:29 -05:00
George Peter Banyard 4a365132e7 Merge branch 'PHP-8.2'
* PHP-8.2:
  Add a new imap_is_open() function to check that a connection object is still valid
2022-12-13 23:48:48 +00:00
George Peter Banyard 52a891aeaa Add a new imap_is_open() function to check that a connection object is still valid 2022-12-13 23:48:03 +00:00
Christoph M. Becker f8ff105420 Merge branch 'PHP-8.2'
* PHP-8.2:
  shmget() with IPC_CREAT must not create 0 size SHM
2022-12-13 19:43:47 +01:00
Christoph M. Becker 4631e9de2b shmget() with IPC_CREAT must not create 0 size SHM
The recently committed fix for GH-9944 did only indirectly cater to
that, namely because in this case `CreateFileMapping()` with a zero
size couldn't be created.  As of PHP 8.2.0, the mappings of the actual
SHM and the info segment have been merged, so creating a zero size SHM
would be possible unless we explicitly prohibit this.
2022-12-13 19:43:13 +01:00
Christoph M. Becker b593b53910 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix Windows shmget() wrt. IPC_PRIVATE
2022-12-13 15:51:07 +01:00
Christoph M. Becker 9089e15940 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix Windows shmget() wrt. IPC_PRIVATE
2022-12-13 15:49:55 +01:00
Tyson Andre 7a983e281c Fix Windows shmget() wrt. IPC_PRIVATE
Fixes #9944

https://man7.org/linux/man-pages/man2/shmget.2.html notes

   The name choice IPC_PRIVATE was perhaps unfortunate, IPC_NEW
   would more clearly show its function.

Closes GH-9946.
2022-12-13 15:46:40 +01:00
Christoph M. Becker 2ca03be46f Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-9949: Partial content on incomplete POST request
2022-12-13 15:25:39 +01:00
Christoph M. Becker 87c2f5b5a2 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-9949: Partial content on incomplete POST request
2022-12-13 15:24:07 +01:00
Christoph M. Becker aef7d810d3 Fix GH-9949: Partial content on incomplete POST request
`ap_get_brigade()` may fail for different reasons, and we must not
pretend that a partially read POST payload is fine; instead we report
a content length of zero what matches all other `read_post()` callbacks
of bundled SAPIs.

Closes GH-10059.
2022-12-13 15:21:42 +01:00
Niels 3ab18d4d14 Change if (stack) check to an assertion (#10090)
The code checks if stack is a NULL pointer. Below that if the
stack->next pointer is updated unconditionally. Therefore a call with a
NULL pointer will crash, even though the if (stack) check seems to show
the intent that it is valid to call the function with NULL.
The function is not meant to be called with NULL, so just ZEND_ASSERT
instead.
2022-12-13 13:16:52 +01:00
Frederik Bosch c5ab72773d [skip ci] Change status of BCMath into Maintained (#10089)
It might not have a primary maintainer, but it is maintained.
2022-12-13 07:35:58 +00:00
David Carlier 3fb7198034 intl extension, follow up on #10006 for numfmt_set_pattern
Closes GH-10073.
2022-12-12 19:54:13 +00:00
George Peter Banyard fa3bbf078a Fix borked Windows tests after 3be2b0d0d8 2022-12-12 16:12:10 +00:00
George Peter Banyard 3be2b0d0d8 Add CLEAN section to some IO tests (#10081)
* Add CLEAN sections to file_(get|put)_contents() tests

* Add CLEAN sections to file() tests
2022-12-12 14:53:32 +00:00
Alex Dowad e36c600a31 Optimize SJIS-Mobile#SOFTBANK decoder for speed
From my microbenchmarks, the new decoder makes encoding conversion
from SJIS-Mobile#SOFTBANK about 15-40% faster.
2022-12-12 16:28:49 +02:00
Alex Dowad 6bf0c44f48 Optimize SJIS-Mobile#KDDI decoder for speed
From my microbenchmarks, the new decoder makes encoding conversion
from SJIS-Mobile#KDDI about 30-50% faster.
2022-12-12 16:28:49 +02:00
Alex Dowad 43cdfa3190 Optimize SJIS-Mobile#DOCOMO decoder for speed
From my microbenchmarks, the new decoder makes encoding conversion
from SJIS-Mobile#DOCOMO about 15-20% faster.
2022-12-12 16:28:49 +02:00
Alex Dowad 4ebfddfad4 Move mobile variants of SJIS into mbfilter_sjis.c 2022-12-12 16:28:49 +02:00
Alex Dowad 005e49e552 Optimize MacJapanese decoder for speed
On longer MacJapanese strings, conversion speed is boosted by 60-80%.
On medium-length strings, conversion speed is boosted around 20-30%.
For very short strings, there is no appreciable difference.
2022-12-12 16:28:49 +02:00
Alex Dowad 4072a76e3f Move MacJapanese implementation into mbfilter_sjis.c 2022-12-12 16:28:49 +02:00
Alex Dowad b3d197d688 Optimize SJIS decoder for speed
While benchmarking the new implementation of mb_substr, I found it was
slower than the old one only when the selected encoding was SJIS.
Investigation showed that the new text conversion filter for SJIS
was a touch slower than the old one.

With this optimization, the new SJIS decoder is about 20% faster than
the old one.
2022-12-12 16:28:49 +02:00
Alex Dowad 0c0774f5b4 Use fast text conversion filters for mb_strpos, mb_stripos, mb_substr, etc
This boosts the performance of mb_strpos, mb_stripos, mb_strrpos,
mb_strripos, mb_strstr, mb_stristr, mb_strrchr, and mb_strrichr when
used on non-UTF-8 strings. mb_substr is also faster.

With UTF-8 input, there is no appreciable difference in performance for
mb_strpos, mb_stripos, mb_strrpos, etc. This is expected, since the only
real difference here (aside from shorter and simpler code) is that the
new text conversion code is used when converting non-UTF-8 input strings
to UTF-8. (This is done because internally, mb_strpos, etc. work only
on UTF-8 text.)

For ASCII, speed is boosted by 30-65%. For other legacy text encodings,
the degree of performance improvement will depend on how slow the
legacy conversion code was.

One other minor, but notable difference is that strings encoded using
UTF-8 variants from Japanese mobile vendors (SoftBank, KDDI, Docomo)
will not undergo encoding conversion but will be processed "as is". It
is expected that this will result in a large performance boost for
such input strings; but realistically, the number of users who work
with such strings is probably minute.

I was not originally planning to include mb_substr in this commit, but
fuzzing of the reimplemented mb_strstr revealed that mb_substr needed
to be reimplemented, too; using the old mbfl_substr, which was based
on the old text conversion filters, in combination with functions which
use the new text conversion filters caused bugs.

The performance boost for mb_substr varies from 10%-500%, depending
on the encoding and input string used.
2022-12-12 16:28:49 +02:00
Ilija Tovilo b96b88b669 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix compilation on RHEL 7 ppc64le (gcc 4.8)
2022-12-11 17:30:56 +01:00
Mattias Ellert a83923044c Fix compilation on RHEL 7 ppc64le (gcc 4.8)
Fixes GH-10077
Closes GH-10078
2022-12-11 17:30:31 +01:00
David Carlier 91e70a4e6b Merge branch 'PHP-8.2' 2022-12-10 14:14:20 +00:00
David Carlier 8a221e2763 fix litespeed SAPI build warnings.
- helpers only called on linux anyway.
- proper C calls prototypes.

Closes GH-10068.
2022-12-10 14:13:30 +00:00