1
0
mirror of https://github.com/php/php-src.git synced 2026-03-24 00:02:20 +01:00
Commit Graph

130833 Commits

Author SHA1 Message Date
Max Kellermann
aa1cd02a43 Zend/zend_fibers: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
308fd311ea ext/{standard,json,random,...}: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann
16203b53e1 main: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann
738fb5ca54 Zend/zend_smart_str: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
9fdbefacd3 main/s[np]printf: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
cd4a7c1d90 Zend/zend_ini: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
928685eba2 Zend/zend_signal: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
01e5ffc85c UPGRADING.INTERNALS: mention the header cleanups 2023-01-10 14:19:03 +00:00
Tim Düsterhus
13b82eef84 random: Randomizer::getFloat(): Fix check for empty open intervals (#10185)
* random: Randomizer::getFloat(): Fix check for empty open intervals

The check for invalid parameters for the IntervalBoundary::OpenOpen variant was
not correct: If two consecutive doubles are passed as parameters, the resulting
interval is empty, resulting in an uint64 underflow in the γ-section
implementation.

Instead of checking whether `$min < $max`, we must check that there is at least
one more double between `$min` and `$max`, i.e. it must hold that:

	nextafter($min, $max) != $max

Instead of duplicating the comparatively complicated and expensive `nextafter`
logic for a rare error case we instead return `NAN` from the γ-section
implementation when the parameters result in an empty interval and thus underflow.

This allows us to reliably detect this specific error case *after* the fact,
but without modifying the engine state. It also provides reliable error
reporting for other internal functions that might use the γ-section
implementation.

* random: γ-section: Also check that that min is smaller than max

This extends the empty-interval check in the γ-section implementation with a
check that min is actually the smaller of the two parameters.

* random: Use PHP_FLOAT_EPSILON in getFloat_error.phpt

Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
2023-01-10 10:16:33 +01:00
Christoph M. Becker
4280431050 Merge branch 'PHP-8.2'
* PHP-8.2:
  Adapt ext/intl tests for ICU 72.1
2023-01-09 14:10:42 +01:00
Christoph M. Becker
435dc5ef1c Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Adapt ext/intl tests for ICU 72.1
2023-01-09 14:09:43 +01:00
Christoph M. Becker
a9e7b90cc2 Adapt ext/intl tests for ICU 72.1
This version replaces SPACEs before the meridian with NARROW NO-BREAK
SPACEs.  Thus, we split the affected test cases as usual.

(cherry picked from commit 8dd51b462d)

Fixes GH-10262.
2023-01-09 14:08:40 +01:00
Dmitry Stogov
ce861373b9 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)
2023-01-09 13:53:35 +03:00
Dmitry Stogov
9abc2108fa Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)
2023-01-09 13:53:19 +03:00
Dmitry Stogov
4d4a53beee Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values) 2023-01-09 13:51:57 +03:00
Dmitry Stogov
f8b9312709 Merge branch 'PHP-8.2'
* PHP-8.2:
  ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
2023-01-09 09:51:12 +03:00
Dmitry Stogov
d13b3b6aa7 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
2023-01-09 09:51:00 +03:00
Max Kellermann
bcc5d268f6 ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
A copy of this piece of code exists in zend_jit_compile_side_trace(),
but there, the leak bug does not exist.

This bug exists since both copies of this piece of code were added in
commit 4bf2d09ede
2023-01-09 09:50:30 +03:00
Alex Dowad
b4cbaabd9b Add fast SSE2-based implementation of mb_strlen for known-valid UTF-8 strings
One small piece of this was obtained from Stack Overflow. According to
Stack Overflow's Terms of Service, all user-contributed code on SO is
provided under a Creative Commons license. I believe this license is
compatible with the code being included in PHP.

Benchmarking results (UTF-8 only, for strings which have already been
checked using mb_check_encoding):

For very short (0-5 byte) strings, mb_strlen is 12% faster.
The speedup gets greater and greater on longer input strings; for
strings around 100KB, mb_strlen is 23 times faster.

Currently the 'fast' code is gated behind a GC flag check which ensures
it is only used on strings which have already been checked for UTF-8
validity. This is because the accelerated code will return different
results on some invalid UTF-8 strings.
2023-01-09 07:50:40 +02:00
Christoph M. Becker
60102c3228 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix recently introduced gh10251.phpt
2023-01-08 18:28:34 +01:00
Christoph M. Becker
6faeb9571d Fix recently introduced gh10251.phpt
As of PHP 8.2.0, creation of dynamic properties is deprecated, so we
slap a `AllowDynamicProperties` attribute on the class.
2023-01-08 18:07:21 +01:00
George Peter Banyard
3b8327a4e3 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
  Fix GH-9710: phpdbg memory leaks by option "-h"
2023-01-08 16:12:21 +00:00
George Peter Banyard
e308dc0635 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
  Fix GH-9710: phpdbg memory leaks by option "-h"
2023-01-08 16:11:46 +00:00
Niels Dossche
d03025bf59 Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
zend_get_property_guard previously assumed that at least "str" has a
pre-computed hash. This is not always the case, for example when a
string is created by bitwise operations, its hash is not set. Instead of
forcing a computation of the hashes, drop the hash comparison.

Closes GH-10254

Co-authored-by: Changochen <changochen1@gmail.com>

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-08 16:09:59 +00:00
Niels Dossche
8ff2b6abb2 Fix GH-9710: phpdbg memory leaks by option "-h"
Closes GH-10237

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-08 16:07:00 +00:00
Alex Dowad
092ad3e462 Optimize branch structure of UTF-8 decoder routine
I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!

(The asm is noticeably tighter, without any extra operations in the
path which dispatches to the code for decoding a 1-byte, 2-byte,
3-byte, or 4-byte character. It's just CMP, conditional jump, CMP,
conditional jump, CMP, conditional jump.

...Though I was admittedly impressed to see gcc could implement the
boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions:
add, CMP, then conditional jump. Pretty slick stuff there, guys.)

Benchmark results:

UTF-8, short - to UTF-16LE  faster by 7.36% (0.0001 vs 0.0002)
UTF-8, short - to UTF-16BE  faster by 6.24% (0.0001 vs 0.0002)
UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003)
UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003)
UTF-8, long - to UTF-16BE   faster by 1.02% (0.0215 vs 0.0217)
UTF-8, long - to UTF-16LE   faster by 1.01% (0.0209 vs 0.0211)
2023-01-08 17:27:19 +02:00
Alex Dowad
d8b5b9fa55 Add unit tests for mb_str_split/mb_substr on MacJapanese encoding
MacJapanese has a somewhat unusual feature that when mapped to
Unicode, many characters map to sequences of several codepoints.
Add test cases demonstrating how mb_str_split and mb_substr behave in
this situation.

When adding these tests, I found the behavior of mb_substr was wrong
due to an inconsistency between the string "length" as measured by
mb_strlen and the number of native MacJapanese characters which
mb_substr would count when iterating over the string using the
mblen_table. This has been fixed.

I believe that mb_strstr will also return wrong results in some cases
for MacJapanese. I still need to come up with unit tests which
demonstrate the problem and figure out how to fix it.
2023-01-08 17:23:47 +02:00
Alex Dowad
cca4ca6d3d Remove 'fast path' using mblen_table from mb_get_strlen (it's actually a slow path)
Various mbstring legacy text encodings have what is called an 'mblen_table';
a table which gives the length of a multi-byte character using a lookup on
the first byte value. Several mbstring functions have a 'fast path' which uses
this table when it is available.

However, it turns out that iterating through a string using the mblen_table
is surprisingly slow. I found that by deleting this 'fast path' from mb_strlen,
while mb_strlen becomes a few percent slower on very small strings (0-5 bytes),
very large performance gains can be achieved on medium to long input strings.

Part of the reason for this is because our text decoding filters are so much
faster now.

Here are some benchmarks:

    EUC-KR, short (0-5 chars)        - master faster by 11.90% (0.0000 vs 0.0000)
    EUC-JP, short (0-5 chars)        - master faster by 10.88% (0.0000 vs 0.0000)
    BIG-5, short (0-5 chars)         - master faster by 10.66% (0.0000 vs 0.0000)
    UTF-8, short (0-5 chars)         - master faster by 8.91% (0.0000 vs 0.0000)
    CP936, short (0-5 chars)         - master faster by 6.27% (0.0000 vs 0.0000)
    UHC, short (0-5 chars)           - master faster by 5.38% (0.0000 vs 0.0000)
    SJIS, short (0-5 chars)          - master faster by 5.20% (0.0000 vs 0.0000)

    UTF-8, medium (~100 chars)       - new faster by 127.51% (0.0004 vs 0.0002)
    UTF-8, long (~10000 chars)       - new faster by 87.94% (0.0319 vs 0.0170)
    UTF-8, very long (~100000 chars) - new faster by 88.25% (0.3199 vs 0.1699)

    SJIS, medium (~100 chars)        - new faster by 208.89% (0.0004 vs 0.0001)
    SJIS, long (~10000 chars)        - new faster by 253.57% (0.0319 vs 0.0090)

    CP936, medium (~100 chars)       - new faster by 126.08% (0.0004 vs 0.0002)
    CP936, long (~10000 chars)       - new faster by 200.48% (0.0319 vs 0.0106)

    EUC-KR, medium (~100 chars)      - new faster by 146.71% (0.0004 vs 0.0002)
    EUC-KR, long (~10000 chars)      - new faster by 212.05% (0.0319 vs 0.0102)

    EUC-JP, medium (~100 chars)      - new faster by 186.68% (0.0004 vs 0.0001)
    EUC-JP, long (~10000 chars)      - new faster by 295.37% (0.0320 vs 0.0081)

    BIG-5, medium (~100 chars)       - new faster by 173.07% (0.0004 vs 0.0001)
    BIG-5, long (~10000 chars)       - new faster by 269.19% (0.0319 vs 0.0086)

    UHC, medium (~100 chars)         - new faster by 196.99% (0.0004 vs 0.0001)
    UHC, long (~10000 chars)         - new faster by 256.39% (0.0323 vs 0.0091)

This does raise the question: is using the 'mblen_table' worthwhile for
other mbstring functions, such as mb_str_split? The answer is yes, it
is worthwhile; you see, while mb_strlen only needs to decode the input
string but not re-encode it, when mb_str_split is implemented using
the conversion filters, it needs to both decode the string and then
re-encode it. This means that there is more potential to gain
performance by using the 'mblen_table'. Benchmarking shows that in a
few cases, mb_str_split becomes faster when the 'mblen_table fast path'
is deleted, but in the majority of cases, it becomes slower.
2023-01-08 17:23:47 +02:00
Niels
58d741c042 Remove unnecessary NULL-checks on ctx (#10256)
ctx can never be zero in these functions because they are dispatched
virtually by looking up their entries in ctx. Furthermore, 2 of these
checks never actually worked because ctx was dereferenced before ctx was
NULL-checked.
2023-01-08 12:09:20 +01:00
Tim Düsterhus
5f42a46405 Merge branch 'PHP-8.2'
* PHP-8.2:
  random: Fix check before closing `random_fd` (#10247)
2023-01-07 14:03:26 +01:00
Tim Düsterhus
32f503e4e3 random: Fix check before closing random_fd (#10247)
If, for whatever reason, the random_fd has been assigned file descriptor `0` it
previously failed to close during module shutdown, thus leaking the descriptor.
2023-01-07 14:03:13 +01:00
George Peter Banyard
1b3e1755af Merge branch 'PHP-8.2'
* PHP-8.2:
  Move test for GH-10200 to the simplexml extension test directory
2023-01-07 03:08:13 +00:00
Niels Dossche
df96346f9c Move test for GH-10200 to the simplexml extension test directory
Closes GH-10252

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-07 03:07:37 +00:00
David CARLIER
84af629e7e follow-up on GH-10238. (#10243)
fixes based on feedback.
2023-01-06 18:03:59 +00:00
David Carlier
69d49e4dd7 posix adding posix_pathconf.
to get configuration variables from a directory/file.
Closes GH-10238.
2023-01-06 14:59:02 +00:00
Alex Dowad
3ab72a4357 Merge branch 'PHP-8.2'
* PHP-8.2:
  Use different mblen_table for different SJIS variants
  Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table
2023-01-06 14:34:10 +02:00
Alex Dowad
1751f34cfa Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Use different mblen_table for different SJIS variants
  Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table
2023-01-06 14:13:21 +02:00
Alex Dowad
3152b7b26f Use different mblen_table for different SJIS variants 2023-01-06 14:09:43 +02:00
Marcos Marcolin
6f785b033d chore: remove semicolon left over.
Closes GH-10236.
2023-01-06 11:14:22 +01:00
Dennis Buteyn
d0e3919458 Close GH-10217: Use strlen() for determining the class_name length
Closes GH-10231.
2023-01-05 17:16:21 +01:00
George Peter Banyard
5033f6fcaa Merge branch 'PHP-8.2'
* PHP-8.2:
  Add missing EXTENSIONS section to test file gh10200
2023-01-05 13:10:49 +00:00
George Peter Banyard
de633c31dd Add missing EXTENSIONS section to test file gh10200 2023-01-05 13:10:28 +00:00
Alex Dowad
d104481af8 Correct entry for 0x80,0xFD-FF in SJIS multi-byte character length table
As a performance optimization, mbstring implements some functions using
tables which give the (byte) length of a multi-byte character using a
lookup based on the value of the first byte. These tables are called
`mblen_table`.

For many years, the mblen_table for SJIS has had '2' in position 0x80.
That is wrong; it should have been '1'. Reasons:

For SJIS, SJIS-2004, and mobile variants of SJIS, 0x80 has never been
treated as the first byte of a 2-byte character. It has always been
treated as a single erroneous byte. On the other hand, 0x80 is a valid
character in MacJapanese... but a 1-byte character, not a 2-byte one.

The same applies to bytes 0xFD-FF; these are 1-byte characters in
MacJapanese, and in other SJIS variants, they are not valid (as the
first byte of a character).

Thanks to the GitHub user 'youkidearitai' for finding this problem.
2023-01-05 14:05:39 +02:00
Alex Dowad
204694cc71 Optimize out more checks from hot path for BIG5 decoding
This boosts the speed of BIG5 encoding conversion by just 1-2%.

I tried various other tweaks to the BIG5 decoding routine to see if
I could make it faster at the cost of using a larger conversion table,
but at least on the machine I am using for benchmarking, these other
changes just made things slower.
2023-01-05 08:05:05 +02:00
Alex Dowad
d75c78b0c8 Optimize out checks in hot path for SJIS decoding
This gives about a 20% speed boost when converting SJIS to some other
encoding.
2023-01-05 08:04:58 +02:00
Alex Dowad
9c283850fb Optimize out another bounds check in BIG5 decoder
This gives about a 9% speed boost for BIG5 encoding conversion.
(Not as much as I was hoping!)
2023-01-05 08:04:51 +02:00
George Peter Banyard
5c64cf58f7 [ci skip] Add UPGRADING entry for posix changes 2023-01-04 20:00:05 +00:00
Alex Dowad
e837a8800b Optimize another check out of hot path for UHC decoding
This gives about another 8-9% speed boost to UHC decoding.
2023-01-04 21:58:27 +02:00
Alex Dowad
a76658b329 Optimize out bounds check in UHC decoder
This gives a 25% speed boost for conversion operations on long strings
(~10,000 codepoints). For shorter strings, the speed boost is less
(as the input gets smaller, it is progressively swamped more and more
by the overhead of entering and exiting the conversion function).

When benchmarking string conversion speed, we are measuring not only
the speed of the decoder, but also the time which it takes to re-encode
the string in another encoding like UTF-8 or UTF-16. So the performance
increase for functions which only need to decode but not re-encode the
input string will be much more than 25%.
2023-01-04 21:58:27 +02:00
Alex Dowad
ffbddc4848 Optimize conversion of GB18030 to Unicode
As with CP936, iterating over the PUA table and looking for matches in
it was a significant bottleneck for GB18030 decoding (though not as
severe a bottleneck as for CP936, since more is involved in GB18030
decoding than CP936 decoding).

Here are some benchmark results after optimizing out that bottleneck:

    GB18030, medium - to UTF-16BE - faster by 60.71% (0.0007 vs 0.0017)
    GB18030, medium - to UTF-8    - faster by 59.88% (0.0007 vs 0.0017)
    GB18030, long - to UTF-8      - faster by 44.91% (0.0669 vs 0.1214)
    GB18030, long - to UTF-16BE   - faster by 43.05% (0.0672 vs 0.1181)
    GB18030, short - to UTF-8     - faster by 27.22% (0.0003 vs 0.0004)
    GB18030, short - to UTF-16BE  - faster by 26.98% (0.0003 vs 0.0004)

(The 'short' test strings had 0-5 codepoints each, 'medium' ~100
codepoints, and 'long' ~10,000 codepoints. For each benchmark, the
test harness cycled through all the test strings 40,000 times.)
2023-01-04 21:58:27 +02:00