archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-22 07:28:09 +02:00

Author	SHA1	Message	Date
Alex Dowad	7f44559516	mb_str{i,}pos does not match illegal byte sequences against occurrences of mb_substitute_char In GitHub issue 9613, it was reported that mb_strpos wrongly matches the character '?' against any invalid string, even when the character '?' clearly does not appear in the invalid string. This behavior has existed at least since PHP 5.2. The reason for the behavior is that mb_strpos internally converts the haystack and needle to UTF-8 before performing a search. When converting to UTF-8, regardless of the setting of mb_substitute_character, libmbfl would use '?' as an error marker for invalid byte sequences. Once those invalid input sequences were replaced with '?', then naturally, they would match against occurrences of the actual character '?' (when it appeared as a 'normal' character, not as an error marker). This would happen regardless of whether the error was in the haystack and '?' was used in the needle, or whether the error was in the needle and '?' was used in the haystack. Why would libmbfl use '?' rather than the mb_substitute_character set by the user? Remember that libmbfl was originally a separate library which was imported into the PHP codebase. mb_substitute_character is an mbstring API function, not something built into libmbfl. When mbstring would call into libmbfl, it would provide the error replacement character to libmbfl as a parameter. However, when libmbfl would perform conversion operations internally, and not because of a direct call from mbstring, it would use its own error replacement character. Example: <?php $questionMark = "\x00?"; $badUTF16 = "\xDB\x00"; // half of a surrogate pair echo mb_strpos($questionMark, $badUTF16, 0, 'UTF-16BE'), "\n"; echo mb_strpos($badUTF16, $questionMark, 0, 'UTF-16BE'), "\n"; Incidentally, this behavior does not occur if the text encoding is UTF-8, because no conversion is needed in that case. mb_stripos had a similar issue, but instead of always using '?' as an error marker internally, it would use the selected mb_substitute_character. So, for example, if the mb_substitute_character was '%', then occurrences of '%' in the haystack would match invalid bytes in the needle, and vice versa. Example: <?php mb_substitute_character(0x25); // '%' $percent = "\x00%"; $badUTF16 = "\xDB\x00"; // half of a surrogate pair echo mb_stripos($percent, $badUTF16, 0, 'UTF-16BE'), "\n"; echo mb_stripos($badUTF16, $percent, 0, 'UTF-16BE'), "\n"; This behavior (of mb_stripos) still occurs even if the text encoding is UTF-8, because case folding is still needed to make the search case-insensitive. It is not hard to think of scenarios where these strange and unintuitive behaviors could cause security vulnerabilities. In the discussion on GH issue 9613, Christoph Becker suggested that mb_str{i,}pos should simply refuse to operate on invalid strings. However, this would almost certainly break existing production code. This commit mitigates the problem in a less intrusive way: it ensures that while invalid haystacks can match invalid needles (even if the specific invalid bytes are different), invalid bytes in the haystack will never match '?' OR occurrences of the mb_substitute_character in the needle, and vice versa. This does represent a backwards compatibility break, but a small one. Since it mitigates a potential security problem, I believe this is appropriate. Closes GH-9613.	2022-12-18 15:31:20 +02:00
Alex Dowad	744ca16e73	Speed boost for mb_stripos (when not using UTF-8) Instead of case-folding a string and then converting it to UTF-8 as a separate operation, why not convert it to UTF-8 at the same time as we fold case? For non-UTF-8 encodings, this typically makes mb_stripos about 2x faster.	2022-12-18 15:31:20 +02:00
Niels	e288438373	Remove unnecessary check of p in phpdbg_trim (#10122 ) The check checks whether p is non-NULL. But if it were NULL the function would crash in later code, so the check is useless. It seems like *p was intended, but that is redundant as well because isspace would return false on '\0'.	2022-12-18 03:19:10 +01:00
Ilija Tovilo	6d9d2eb355	Optimize JMP[N]Z_EX to BOOL instead of QM_ASSIGN (#10108 ) && and \|\| should always evaluate to a boolean instead of the lhs/rhs. This optimization never gets triggered for any of our tests. Additionally, even if triggered this instruction gets optimized away because the else branch of the JMP instruction will overwrite the tmp value.	2022-12-17 12:47:02 +01:00
Arnaud Le Blanc	027add9e1b	[ci skip] UPGRADING	2022-12-16 18:14:22 +01:00
Arnaud Le Blanc	0ff4a9accd	[ci skip] UPGRADING	2022-12-16 18:12:28 +01:00
Arnaud Le Blanc	a11c8a3039	Limit stack size (#9104 )	2022-12-16 17:44:26 +01:00
Máté Kocsis	dc54e04ed4	Merge branch 'PHP-8.2' * PHP-8.2: Only include the default constructor for non-abstract class synopses	2022-12-16 17:03:22 +01:00
Máté Kocsis	d832125b8e	Only include the default constructor for non-abstract class synopses	2022-12-16 17:02:35 +01:00
Christoph M. Becker	416420b362	[ci skip] Remove duplicated NEWS entry	2022-12-16 14:45:00 +01:00
Christoph M. Becker	cea0fc04d1	Merge branch 'PHP-8.2' * PHP-8.2: Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create()	2022-12-16 14:38:09 +01:00
Christoph M. Becker	018fbd0a68	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create()	2022-12-16 14:37:39 +01:00
Christoph M. Becker	b8ac2071b8	Fix GH-10112: LDAP\Connection::__construct() refers to ldap_create() There is no `ldap_create()`, but rather `ldap_connect()`. Closes GH-10115.	2022-12-16 14:36:30 +01:00
Máté Kocsis	8afc55870e	Merge branch 'PHP-8.2' * PHP-8.2: Replace another root XML element format to the "canonical" one Remove the superfluous closing parentheses from class synopsis page includes Always include the constructor on the class manual pages Backport methodsynopsis role attributes changes from master	2022-12-16 13:21:39 +01:00
Máté Kocsis	6aa5e58414	Backport methodsynopsis role attributes changes from master Commits https://github.com/php/php-src/commit/93605f286d11876da44d2ecd41c13d7e3f0aae66 and https://github.com/php/php-src/commit/d6651426f405342f74cdfe930448912ef68e23c4	2022-12-16 13:18:12 +01:00
Máté Kocsis	0fc60fab72	Always include the constructor on the class manual pages	2022-12-16 13:18:12 +01:00
Máté Kocsis	b4df038cee	Remove the superfluous closing parentheses from class synopsis page includes	2022-12-16 13:18:12 +01:00
Máté Kocsis	60cf9fbee0	Replace another root XML element format to the "canonical" one	2022-12-16 13:18:12 +01:00
Alex Dowad	b9cd1cdb4f	Implement mb_substr_count using fast text conversion filters The performance gain from this change depends on the text encoding and input string size. For very small strings, other overheads tend to swamp the performance gains to some extent, such that the speedup is less than 2x. For medium-length strings (~100 bytes or so), the speedup is typically around 2.5x. The greatest performance gains are for UTF-8 strings which have already been marked as valid (using the GC flags on the zend_string object); for those, the speedup is more than 10x in many cases. The previous implementation first converted the haystack and needle to wchars, then searched for matches between the two sequences of wchars. Because we use -1 as an error marker when converting to wchars, error markers from invalid byte sequences in the haystack would match error markers from invalid byte sequences in the needle, even if the specific invalid byte sequence was different. I am not sure whether this behavior is really desirable or not, but anyways, this new implementation follows the same behavior so as not to cause BC breaks.	2022-12-15 07:54:26 +02:00
Tim Düsterhus	f9a1a90380	Add Randomizer::nextFloat() and Randomizer::getFloat() (#9679 ) * random: Add Randomizer::nextFloat() * random: Check that doubles are IEEE-754 in Randomizer::nextFloat() * random: Add Randomizer::nextFloat() tests * random: Add Randomizer::getFloat() implementing the y-section algorithm The algorithm is published in: Drawing Random Floating-Point Numbers from an Interval. Frédéric Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022. https://doi.org/10.1145/3503512 * random: Implement getFloat_gamma() optimization see https://github.com/php/php-src/pull/9679/files#r994668327 * random: Add Random\IntervalBoundary * random: Split the implementation of γ-section into its own file * random: Add tests for Randomizer::getFloat() * random: Fix γ-section for 32-bit systems * random: Replace check for __STDC_IEC_559__ by compile-time check for DBL_MANT_DIG * random: Drop nextFloat_spacing.phpt * random: Optimize Randomizer::getFloat() implementation * random: Reject non-finite parameters in Randomizer::getFloat() * random: Add NEWS/UPGRADING for Randomizer’s float functionality	2022-12-14 17:48:47 +01:00
Tim Düsterhus	284f61ee22	[ci skip] Fix typo in `unserialize()` function name in NEWS see `dd8de1e726`	2022-12-14 17:43:43 +01:00
Pierrick Charron	2f119c3008	Merge branch 'PHP-8.2' * PHP-8.2: PHP-8.2 is now for PHP 8.2.2-dev	2022-12-13 19:31:11 -05:00
Pierrick Charron	002d54db9f	PHP-8.2 is now for PHP 8.2.2-dev	2022-12-13 19:29:29 -05:00
George Peter Banyard	4a365132e7	Merge branch 'PHP-8.2' * PHP-8.2: Add a new imap_is_open() function to check that a connection object is still valid	2022-12-13 23:48:48 +00:00
George Peter Banyard	52a891aeaa	Add a new imap_is_open() function to check that a connection object is still valid	2022-12-13 23:48:03 +00:00
Christoph M. Becker	f8ff105420	Merge branch 'PHP-8.2' * PHP-8.2: shmget() with IPC_CREAT must not create 0 size SHM	2022-12-13 19:43:47 +01:00
Christoph M. Becker	4631e9de2b	shmget() with IPC_CREAT must not create 0 size SHM The recently committed fix for GH-9944 did only indirectly cater to that, namely because in this case `CreateFileMapping()` with a zero size couldn't be created. As of PHP 8.2.0, the mappings of the actual SHM and the info segment have been merged, so creating a zero size SHM would be possible unless we explicitly prohibit this.	2022-12-13 19:43:13 +01:00
Christoph M. Becker	b593b53910	Merge branch 'PHP-8.2' * PHP-8.2: Fix Windows shmget() wrt. IPC_PRIVATE	2022-12-13 15:51:07 +01:00
Christoph M. Becker	9089e15940	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Fix Windows shmget() wrt. IPC_PRIVATE	2022-12-13 15:49:55 +01:00
Tyson Andre	7a983e281c	Fix Windows shmget() wrt. IPC_PRIVATE Fixes #9944 https://man7.org/linux/man-pages/man2/shmget.2.html notes The name choice IPC_PRIVATE was perhaps unfortunate, IPC_NEW would more clearly show its function. Closes GH-9946.	2022-12-13 15:46:40 +01:00
Christoph M. Becker	2ca03be46f	Merge branch 'PHP-8.2' * PHP-8.2: Fix GH-9949: Partial content on incomplete POST request	2022-12-13 15:25:39 +01:00
Christoph M. Becker	87c2f5b5a2	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Fix GH-9949: Partial content on incomplete POST request	2022-12-13 15:24:07 +01:00
Christoph M. Becker	aef7d810d3	Fix GH-9949: Partial content on incomplete POST request `ap_get_brigade()` may fail for different reasons, and we must not pretend that a partially read POST payload is fine; instead we report a content length of zero what matches all other `read_post()` callbacks of bundled SAPIs. Closes GH-10059.	2022-12-13 15:21:42 +01:00
Niels	3ab18d4d14	Change if (stack) check to an assertion (#10090 ) The code checks if stack is a NULL pointer. Below that if the stack->next pointer is updated unconditionally. Therefore a call with a NULL pointer will crash, even though the if (stack) check seems to show the intent that it is valid to call the function with NULL. The function is not meant to be called with NULL, so just ZEND_ASSERT instead.	2022-12-13 13:16:52 +01:00
Frederik Bosch	c5ab72773d	[skip ci] Change status of BCMath into Maintained (#10089 ) It might not have a primary maintainer, but it is maintained.	2022-12-13 07:35:58 +00:00
David Carlier	3fb7198034	intl extension, follow up on #10006 for numfmt_set_pattern Closes GH-10073.	2022-12-12 19:54:13 +00:00
George Peter Banyard	fa3bbf078a	Fix borked Windows tests after `3be2b0d0d8`	2022-12-12 16:12:10 +00:00
George Peter Banyard	3be2b0d0d8	Add CLEAN section to some IO tests (#10081 ) * Add CLEAN sections to file_(get\|put)_contents() tests * Add CLEAN sections to file() tests	2022-12-12 14:53:32 +00:00
Alex Dowad	e36c600a31	Optimize SJIS-Mobile#SOFTBANK decoder for speed From my microbenchmarks, the new decoder makes encoding conversion from SJIS-Mobile#SOFTBANK about 15-40% faster.	2022-12-12 16:28:49 +02:00
Alex Dowad	6bf0c44f48	Optimize SJIS-Mobile#KDDI decoder for speed From my microbenchmarks, the new decoder makes encoding conversion from SJIS-Mobile#KDDI about 30-50% faster.	2022-12-12 16:28:49 +02:00
Alex Dowad	43cdfa3190	Optimize SJIS-Mobile#DOCOMO decoder for speed From my microbenchmarks, the new decoder makes encoding conversion from SJIS-Mobile#DOCOMO about 15-20% faster.	2022-12-12 16:28:49 +02:00
Alex Dowad	4ebfddfad4	Move mobile variants of SJIS into mbfilter_sjis.c	2022-12-12 16:28:49 +02:00
Alex Dowad	005e49e552	Optimize MacJapanese decoder for speed On longer MacJapanese strings, conversion speed is boosted by 60-80%. On medium-length strings, conversion speed is boosted around 20-30%. For very short strings, there is no appreciable difference.	2022-12-12 16:28:49 +02:00
Alex Dowad	4072a76e3f	Move MacJapanese implementation into mbfilter_sjis.c	2022-12-12 16:28:49 +02:00
Alex Dowad	b3d197d688	Optimize SJIS decoder for speed While benchmarking the new implementation of mb_substr, I found it was slower than the old one only when the selected encoding was SJIS. Investigation showed that the new text conversion filter for SJIS was a touch slower than the old one. With this optimization, the new SJIS decoder is about 20% faster than the old one.	2022-12-12 16:28:49 +02:00
Alex Dowad	0c0774f5b4	Use fast text conversion filters for mb_strpos, mb_stripos, mb_substr, etc This boosts the performance of mb_strpos, mb_stripos, mb_strrpos, mb_strripos, mb_strstr, mb_stristr, mb_strrchr, and mb_strrichr when used on non-UTF-8 strings. mb_substr is also faster. With UTF-8 input, there is no appreciable difference in performance for mb_strpos, mb_stripos, mb_strrpos, etc. This is expected, since the only real difference here (aside from shorter and simpler code) is that the new text conversion code is used when converting non-UTF-8 input strings to UTF-8. (This is done because internally, mb_strpos, etc. work only on UTF-8 text.) For ASCII, speed is boosted by 30-65%. For other legacy text encodings, the degree of performance improvement will depend on how slow the legacy conversion code was. One other minor, but notable difference is that strings encoded using UTF-8 variants from Japanese mobile vendors (SoftBank, KDDI, Docomo) will not undergo encoding conversion but will be processed "as is". It is expected that this will result in a large performance boost for such input strings; but realistically, the number of users who work with such strings is probably minute. I was not originally planning to include mb_substr in this commit, but fuzzing of the reimplemented mb_strstr revealed that mb_substr needed to be reimplemented, too; using the old mbfl_substr, which was based on the old text conversion filters, in combination with functions which use the new text conversion filters caused bugs. The performance boost for mb_substr varies from 10%-500%, depending on the encoding and input string used.	2022-12-12 16:28:49 +02:00
Ilija Tovilo	b96b88b669	Merge branch 'PHP-8.2' * PHP-8.2: Fix compilation on RHEL 7 ppc64le (gcc 4.8)	2022-12-11 17:30:56 +01:00
Mattias Ellert	a83923044c	Fix compilation on RHEL 7 ppc64le (gcc 4.8) Fixes GH-10077 Closes GH-10078	2022-12-11 17:30:31 +01:00
David Carlier	91e70a4e6b	Merge branch 'PHP-8.2'	2022-12-10 14:14:20 +00:00
David Carlier	8a221e2763	fix litespeed SAPI build warnings. - helpers only called on linux anyway. - proper C calls prototypes. Closes GH-10068.	2022-12-10 14:13:30 +00:00

1 2 3 4 5 ...

130660 Commits