archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-03-24 00:02:20 +01:00

Author	SHA1	Message	Date
Max Kellermann	aa1cd02a43	Zend/zend_fibers: include cleanup	2023-01-10 14:19:03 +00:00
Max Kellermann	308fd311ea	ext/{standard,json,random,...}: add missing includes	2023-01-10 14:19:03 +00:00
Max Kellermann	16203b53e1	main: add missing includes	2023-01-10 14:19:03 +00:00
Max Kellermann	738fb5ca54	Zend/zend_smart_str: include cleanup	2023-01-10 14:19:03 +00:00
Max Kellermann	9fdbefacd3	main/s[np]printf: include cleanup	2023-01-10 14:19:03 +00:00
Max Kellermann	cd4a7c1d90	Zend/zend_ini: include cleanup	2023-01-10 14:19:03 +00:00
Max Kellermann	928685eba2	Zend/zend_signal: include cleanup	2023-01-10 14:19:03 +00:00
Max Kellermann	01e5ffc85c	UPGRADING.INTERNALS: mention the header cleanups	2023-01-10 14:19:03 +00:00
Tim Düsterhus	13b82eef84	random: Randomizer::getFloat(): Fix check for empty open intervals (#10185 ) * random: Randomizer::getFloat(): Fix check for empty open intervals The check for invalid parameters for the IntervalBoundary::OpenOpen variant was not correct: If two consecutive doubles are passed as parameters, the resulting interval is empty, resulting in an uint64 underflow in the γ-section implementation. Instead of checking whether `$min < $max`, we must check that there is at least one more double between `$min` and `$max`, i.e. it must hold that: nextafter($min, $max) != $max Instead of duplicating the comparatively complicated and expensive `nextafter` logic for a rare error case we instead return `NAN` from the γ-section implementation when the parameters result in an empty interval and thus underflow. This allows us to reliably detect this specific error case after the fact, but without modifying the engine state. It also provides reliable error reporting for other internal functions that might use the γ-section implementation. * random: γ-section: Also check that that min is smaller than max This extends the empty-interval check in the γ-section implementation with a check that min is actually the smaller of the two parameters. * random: Use PHP_FLOAT_EPSILON in getFloat_error.phpt Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>	2023-01-10 10:16:33 +01:00
Christoph M. Becker	4280431050	Merge branch 'PHP-8.2' * PHP-8.2: Adapt ext/intl tests for ICU 72.1	2023-01-09 14:10:42 +01:00
Christoph M. Becker	435dc5ef1c	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Adapt ext/intl tests for ICU 72.1	2023-01-09 14:09:43 +01:00
Christoph M. Becker	a9e7b90cc2	Adapt ext/intl tests for ICU 72.1 This version replaces SPACEs before the meridian with NARROW NO-BREAK SPACEs. Thus, we split the affected test cases as usual. (cherry picked from commit `8dd51b462d`) Fixes GH-10262.	2023-01-09 14:08:40 +01:00
Dmitry Stogov	ce861373b9	Merge branch 'PHP-8.2' * PHP-8.2: Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)	2023-01-09 13:53:35 +03:00
Dmitry Stogov	9abc2108fa	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)	2023-01-09 13:53:19 +03:00
Dmitry Stogov	4d4a53beee	Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)	2023-01-09 13:51:57 +03:00
Dmitry Stogov	f8b9312709	Merge branch 'PHP-8.2' * PHP-8.2: ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)	2023-01-09 09:51:12 +03:00
Dmitry Stogov	d13b3b6aa7	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)	2023-01-09 09:51:00 +03:00
Max Kellermann	bcc5d268f6	ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146 ) A copy of this piece of code exists in zend_jit_compile_side_trace(), but there, the leak bug does not exist. This bug exists since both copies of this piece of code were added in commit `4bf2d09ede`	2023-01-09 09:50:30 +03:00
Alex Dowad	b4cbaabd9b	Add fast SSE2-based implementation of mb_strlen for known-valid UTF-8 strings One small piece of this was obtained from Stack Overflow. According to Stack Overflow's Terms of Service, all user-contributed code on SO is provided under a Creative Commons license. I believe this license is compatible with the code being included in PHP. Benchmarking results (UTF-8 only, for strings which have already been checked using mb_check_encoding): For very short (0-5 byte) strings, mb_strlen is 12% faster. The speedup gets greater and greater on longer input strings; for strings around 100KB, mb_strlen is 23 times faster. Currently the 'fast' code is gated behind a GC flag check which ensures it is only used on strings which have already been checked for UTF-8 validity. This is because the accelerated code will return different results on some invalid UTF-8 strings.	2023-01-09 07:50:40 +02:00
Christoph M. Becker	60102c3228	Merge branch 'PHP-8.2' * PHP-8.2: Fix recently introduced gh10251.phpt	2023-01-08 18:28:34 +01:00
Christoph M. Becker	6faeb9571d	Fix recently introduced gh10251.phpt As of PHP 8.2.0, creation of dynamic properties is deprecated, so we slap a `AllowDynamicProperties` attribute on the class.	2023-01-08 18:07:21 +01:00
George Peter Banyard	3b8327a4e3	Merge branch 'PHP-8.2' * PHP-8.2: Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed. Fix GH-9710: phpdbg memory leaks by option "-h"	2023-01-08 16:12:21 +00:00
George Peter Banyard	e308dc0635	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed. Fix GH-9710: phpdbg memory leaks by option "-h"	2023-01-08 16:11:46 +00:00
Niels Dossche	d03025bf59	Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed. zend_get_property_guard previously assumed that at least "str" has a pre-computed hash. This is not always the case, for example when a string is created by bitwise operations, its hash is not set. Instead of forcing a computation of the hashes, drop the hash comparison. Closes GH-10254 Co-authored-by: Changochen <changochen1@gmail.com> Signed-off-by: George Peter Banyard <girgias@php.net>	2023-01-08 16:09:59 +00:00
Niels Dossche	8ff2b6abb2	Fix GH-9710: phpdbg memory leaks by option "-h" Closes GH-10237 Signed-off-by: George Peter Banyard <girgias@php.net>	2023-01-08 16:07:00 +00:00
Alex Dowad	092ad3e462	Optimize branch structure of UTF-8 decoder routine I like the asm which gcc -O3 generates on this modified code... and guess what: my CPU likes it too! (The asm is noticeably tighter, without any extra operations in the path which dispatches to the code for decoding a 1-byte, 2-byte, 3-byte, or 4-byte character. It's just CMP, conditional jump, CMP, conditional jump, CMP, conditional jump. ...Though I was admittedly impressed to see gcc could implement the boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions: add, CMP, then conditional jump. Pretty slick stuff there, guys.) Benchmark results: UTF-8, short - to UTF-16LE faster by 7.36% (0.0001 vs 0.0002) UTF-8, short - to UTF-16BE faster by 6.24% (0.0001 vs 0.0002) UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003) UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003) UTF-8, long - to UTF-16BE faster by 1.02% (0.0215 vs 0.0217) UTF-8, long - to UTF-16LE faster by 1.01% (0.0209 vs 0.0211)	2023-01-08 17:27:19 +02:00
Alex Dowad	d8b5b9fa55	Add unit tests for mb_str_split/mb_substr on MacJapanese encoding MacJapanese has a somewhat unusual feature that when mapped to Unicode, many characters map to sequences of several codepoints. Add test cases demonstrating how mb_str_split and mb_substr behave in this situation. When adding these tests, I found the behavior of mb_substr was wrong due to an inconsistency between the string "length" as measured by mb_strlen and the number of native MacJapanese characters which mb_substr would count when iterating over the string using the mblen_table. This has been fixed. I believe that mb_strstr will also return wrong results in some cases for MacJapanese. I still need to come up with unit tests which demonstrate the problem and figure out how to fix it.	2023-01-08 17:23:47 +02:00
Alex Dowad	cca4ca6d3d	Remove 'fast path' using mblen_table from mb_get_strlen (it's actually a slow path) Various mbstring legacy text encodings have what is called an 'mblen_table'; a table which gives the length of a multi-byte character using a lookup on the first byte value. Several mbstring functions have a 'fast path' which uses this table when it is available. However, it turns out that iterating through a string using the mblen_table is surprisingly slow. I found that by deleting this 'fast path' from mb_strlen, while mb_strlen becomes a few percent slower on very small strings (0-5 bytes), very large performance gains can be achieved on medium to long input strings. Part of the reason for this is because our text decoding filters are so much faster now. Here are some benchmarks: EUC-KR, short (0-5 chars) - master faster by 11.90% (0.0000 vs 0.0000) EUC-JP, short (0-5 chars) - master faster by 10.88% (0.0000 vs 0.0000) BIG-5, short (0-5 chars) - master faster by 10.66% (0.0000 vs 0.0000) UTF-8, short (0-5 chars) - master faster by 8.91% (0.0000 vs 0.0000) CP936, short (0-5 chars) - master faster by 6.27% (0.0000 vs 0.0000) UHC, short (0-5 chars) - master faster by 5.38% (0.0000 vs 0.0000) SJIS, short (0-5 chars) - master faster by 5.20% (0.0000 vs 0.0000) UTF-8, medium (~100 chars) - new faster by 127.51% (0.0004 vs 0.0002) UTF-8, long (~10000 chars) - new faster by 87.94% (0.0319 vs 0.0170) UTF-8, very long (~100000 chars) - new faster by 88.25% (0.3199 vs 0.1699) SJIS, medium (~100 chars) - new faster by 208.89% (0.0004 vs 0.0001) SJIS, long (~10000 chars) - new faster by 253.57% (0.0319 vs 0.0090) CP936, medium (~100 chars) - new faster by 126.08% (0.0004 vs 0.0002) CP936, long (~10000 chars) - new faster by 200.48% (0.0319 vs 0.0106) EUC-KR, medium (~100 chars) - new faster by 146.71% (0.0004 vs 0.0002) EUC-KR, long (~10000 chars) - new faster by 212.05% (0.0319 vs 0.0102) EUC-JP, medium (~100 chars) - new faster by 186.68% (0.0004 vs 0.0001) EUC-JP, long (~10000 chars) - new faster by 295.37% (0.0320 vs 0.0081) BIG-5, medium (~100 chars) - new faster by 173.07% (0.0004 vs 0.0001) BIG-5, long (~10000 chars) - new faster by 269.19% (0.0319 vs 0.0086) UHC, medium (~100 chars) - new faster by 196.99% (0.0004 vs 0.0001) UHC, long (~10000 chars) - new faster by 256.39% (0.0323 vs 0.0091) This does raise the question: is using the 'mblen_table' worthwhile for other mbstring functions, such as mb_str_split? The answer is yes, it is worthwhile; you see, while mb_strlen only needs to decode the input string but not re-encode it, when mb_str_split is implemented using the conversion filters, it needs to both decode the string and then re-encode it. This means that there is more potential to gain performance by using the 'mblen_table'. Benchmarking shows that in a few cases, mb_str_split becomes faster when the 'mblen_table fast path' is deleted, but in the majority of cases, it becomes slower.	2023-01-08 17:23:47 +02:00
Niels	58d741c042	Remove unnecessary NULL-checks on ctx (#10256 ) ctx can never be zero in these functions because they are dispatched virtually by looking up their entries in ctx. Furthermore, 2 of these checks never actually worked because ctx was dereferenced before ctx was NULL-checked.	2023-01-08 12:09:20 +01:00
Tim Düsterhus	5f42a46405	Merge branch 'PHP-8.2' * PHP-8.2: random: Fix check before closing `random_fd` (#10247)	2023-01-07 14:03:26 +01:00
Tim Düsterhus	32f503e4e3	random: Fix check before closing `random_fd` (#10247 ) If, for whatever reason, the random_fd has been assigned file descriptor `0` it previously failed to close during module shutdown, thus leaking the descriptor.	2023-01-07 14:03:13 +01:00
George Peter Banyard	1b3e1755af	Merge branch 'PHP-8.2' * PHP-8.2: Move test for GH-10200 to the simplexml extension test directory	2023-01-07 03:08:13 +00:00
Niels Dossche	df96346f9c	Move test for GH-10200 to the simplexml extension test directory Closes GH-10252 Signed-off-by: George Peter Banyard <girgias@php.net>	2023-01-07 03:07:37 +00:00
David CARLIER	84af629e7e	follow-up on GH-10238. (#10243 ) fixes based on feedback.	2023-01-06 18:03:59 +00:00
David Carlier	69d49e4dd7	posix adding posix_pathconf. to get configuration variables from a directory/file. Closes GH-10238.	2023-01-06 14:59:02 +00:00
Alex Dowad	3ab72a4357	Merge branch 'PHP-8.2' * PHP-8.2: Use different mblen_table for different SJIS variants Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table	2023-01-06 14:34:10 +02:00
Alex Dowad	1751f34cfa	Merge branch 'PHP-8.1' into PHP-8.2 * PHP-8.1: Use different mblen_table for different SJIS variants Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table	2023-01-06 14:13:21 +02:00
Alex Dowad	3152b7b26f	Use different mblen_table for different SJIS variants	2023-01-06 14:09:43 +02:00
Marcos Marcolin	6f785b033d	chore: remove semicolon left over. Closes GH-10236.	2023-01-06 11:14:22 +01:00
Dennis Buteyn	d0e3919458	Close GH-10217: Use strlen() for determining the class_name length Closes GH-10231.	2023-01-05 17:16:21 +01:00
George Peter Banyard	5033f6fcaa	Merge branch 'PHP-8.2' * PHP-8.2: Add missing EXTENSIONS section to test file gh10200	2023-01-05 13:10:49 +00:00
George Peter Banyard	de633c31dd	Add missing EXTENSIONS section to test file gh10200	2023-01-05 13:10:28 +00:00
Alex Dowad	d104481af8	Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table As a performance optimization, mbstring implements some functions using tables which give the (byte) length of a multi-byte character using a lookup based on the value of the first byte. These tables are called `mblen_table`. For many years, the mblen_table for SJIS has had '2' in position 0x80. That is wrong; it should have been '1'. Reasons: For SJIS, SJIS-2004, and mobile variants of SJIS, 0x80 has never been treated as the first byte of a 2-byte character. It has always been treated as a single erroneous byte. On the other hand, 0x80 is a valid character in MacJapanese... but a 1-byte character, not a 2-byte one. The same applies to bytes 0xFD-FF; these are 1-byte characters in MacJapanese, and in other SJIS variants, they are not valid (as the first byte of a character). Thanks to the GitHub user 'youkidearitai' for finding this problem.	2023-01-05 14:05:39 +02:00
Alex Dowad	204694cc71	Optimize out more checks from hot path for BIG5 decoding This boosts the speed of BIG5 encoding conversion by just 1-2%. I tried various other tweaks to the BIG5 decoding routine to see if I could make it faster at the cost of using a larger conversion table, but at least on the machine I am using for benchmarking, these other changes just made things slower.	2023-01-05 08:05:05 +02:00
Alex Dowad	d75c78b0c8	Optimize out checks in hot path for SJIS decoding This gives about a 20% speed boost when converting SJIS to some other encoding.	2023-01-05 08:04:58 +02:00
Alex Dowad	9c283850fb	Optimize out another bounds check in BIG5 decoder This gives about a 9% speed boost for BIG5 encoding conversion. (Not as much as I was hoping!)	2023-01-05 08:04:51 +02:00
George Peter Banyard	5c64cf58f7	[ci skip] Add UPGRADING entry for posix changes	2023-01-04 20:00:05 +00:00
Alex Dowad	e837a8800b	Optimize another check out of hot path for UHC decoding This gives about another 8-9% speed boost to UHC decoding.	2023-01-04 21:58:27 +02:00
Alex Dowad	a76658b329	Optimize out bounds check in UHC decoder This gives a 25% speed boost for conversion operations on long strings (~10,000 codepoints). For shorter strings, the speed boost is less (as the input gets smaller, it is progressively swamped more and more by the overhead of entering and exiting the conversion function). When benchmarking string conversion speed, we are measuring not only the speed of the decoder, but also the time which it takes to re-encode the string in another encoding like UTF-8 or UTF-16. So the performance increase for functions which only need to decode but not re-encode the input string will be much more than 25%.	2023-01-04 21:58:27 +02:00
Alex Dowad	ffbddc4848	Optimize conversion of GB18030 to Unicode As with CP936, iterating over the PUA table and looking for matches in it was a significant bottleneck for GB18030 decoding (though not as severe a bottleneck as for CP936, since more is involved in GB18030 decoding than CP936 decoding). Here are some benchmark results after optimizing out that bottleneck: GB18030, medium - to UTF-16BE - faster by 60.71% (0.0007 vs 0.0017) GB18030, medium - to UTF-8 - faster by 59.88% (0.0007 vs 0.0017) GB18030, long - to UTF-8 - faster by 44.91% (0.0669 vs 0.1214) GB18030, long - to UTF-16BE - faster by 43.05% (0.0672 vs 0.1181) GB18030, short - to UTF-8 - faster by 27.22% (0.0003 vs 0.0004) GB18030, short - to UTF-16BE - faster by 26.98% (0.0003 vs 0.0004) (The 'short' test strings had 0-5 codepoints each, 'medium' ~100 codepoints, and 'long' ~10,000 codepoints. For each benchmark, the test harness cycled through all the test strings 40,000 times.)	2023-01-04 21:58:27 +02:00

1 2 3 4 5 ...

130833 Commits