1
0
mirror of https://github.com/php/php-src.git synced 2026-04-21 23:18:13 +02:00
Commit Graph

130854 Commits

Author SHA1 Message Date
Max Kellermann 4831e48708 Zend/zend_system_id: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann cd985de190 ext/standard/md5: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann 9521d21681 main/php_globals.h: add missing include for PHPAPI 2023-01-12 15:12:45 +00:00
Max Kellermann d6136151e9 Zend/zend_build.h: include php_config.h
Without this, the macros ZTS, ZEND_DEBUG and PHP_COMPILER_ID may be
unavailable.
2023-01-12 15:12:45 +00:00
Jakub Zelenka da4775f071 Merge branch 'PHP-8.2' 2023-01-12 13:55:47 +00:00
Jakub Zelenka 1b48a5c802 Fix ASAN reported leak in FPM config test
This happens because config test does not shutdown SAPI.

In addition this commit also fixes few failures when running FPM tests
under root.

Closes GH-10296
2023-01-12 13:52:33 +00:00
Alex Dowad 4427b2e1ab Mark UTF-8 strings emitted by mbstring functions as valid UTF-8
We now have a couple of mbstring functions which have fast paths for
strings marked as 'valid UTF-8'. Later, we may likely have more. So
that these fast paths can be used more frequently, mark UTF-8 strings
emitted by mbstring as 'valid UTF-8'. This is always a correct thing
to do, because mbstring never returns invalid UTF-8 as the result of
a conversion (or similar) operation.

Internally, we do have a conversion mode which deliberately emits
invalid UTF-8 in some cases. (This is done to prevent unwanted matches
when we are converting strings to UTF-8 before performing matching
operations on them.) For such strings, don't set the 'valid UTF-8' flag.
It probably wouldn't hurt anything to set it, because strings generated
using that special conversion mode should *never* be returned to
userland, and I don't think we do anything with them which cares about
the IS_STR_VALID_UTF8 flag... but still, it would likely cause
confusion for developers.
2023-01-11 17:08:27 +02:00
Tim Düsterhus e7c0f4e816 random: Rely on free(NULL) being safe for random status freeing (#10246)
* random: Rely on `free(NULL)` being safe for random status freeing

* random: Restructure `php_random_status_free()` to not early-return
2023-01-10 18:46:57 +01:00
George Peter Banyard d7f624258d Merge branch 'PHP-8.2'
* PHP-8.2:
  fix: indirect_return compilation warning
2023-01-10 15:23:44 +00:00
George Peter Banyard c936c02119 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  fix: indirect_return compilation warning
2023-01-10 15:23:35 +00:00
Kévin Dunglas 55514a1119 fix: indirect_return compilation warning
Closes GH-10274

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-10 15:23:15 +00:00
Derick Rethans cc4e958932 Merge branch 'PHP-8.2' 2023-01-10 15:16:42 +00:00
Derick Rethans f340854a30 Merge branch 'PHP-8.1' into PHP-8.2 2023-01-10 15:16:32 +00:00
Derick Rethans d12ba111e0 Fixed GH-10218: DateTimeZone fails to parse time zones that contain the "+" character 2023-01-10 15:15:49 +00:00
David Carlier 61cf7d49ab posix_pathconf throwing ValueError on empty path 2023-01-10 15:03:11 +00:00
Max Kellermann ecc880f491 Zend/zend_execute: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 588a07f737 Zend/zend_multibyte: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann f377e15751 Zend/zend_ptr_stack: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann b4ba16fe18 Zend/zend_object_handlers: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 694ec1deea Zend/zend_{operators,variables}: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 6b34de8eba sapi/*: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann aa1cd02a43 Zend/zend_fibers: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 308fd311ea ext/{standard,json,random,...}: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann 16203b53e1 main: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann 738fb5ca54 Zend/zend_smart_str: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 9fdbefacd3 main/s[np]printf: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann cd4a7c1d90 Zend/zend_ini: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 928685eba2 Zend/zend_signal: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann 01e5ffc85c UPGRADING.INTERNALS: mention the header cleanups 2023-01-10 14:19:03 +00:00
Tim Düsterhus 13b82eef84 random: Randomizer::getFloat(): Fix check for empty open intervals (#10185)
* random: Randomizer::getFloat(): Fix check for empty open intervals

The check for invalid parameters for the IntervalBoundary::OpenOpen variant was
not correct: If two consecutive doubles are passed as parameters, the resulting
interval is empty, resulting in an uint64 underflow in the γ-section
implementation.

Instead of checking whether `$min < $max`, we must check that there is at least
one more double between `$min` and `$max`, i.e. it must hold that:

	nextafter($min, $max) != $max

Instead of duplicating the comparatively complicated and expensive `nextafter`
logic for a rare error case we instead return `NAN` from the γ-section
implementation when the parameters result in an empty interval and thus underflow.

This allows us to reliably detect this specific error case *after* the fact,
but without modifying the engine state. It also provides reliable error
reporting for other internal functions that might use the γ-section
implementation.

* random: γ-section: Also check that that min is smaller than max

This extends the empty-interval check in the γ-section implementation with a
check that min is actually the smaller of the two parameters.

* random: Use PHP_FLOAT_EPSILON in getFloat_error.phpt

Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
2023-01-10 10:16:33 +01:00
Christoph M. Becker 4280431050 Merge branch 'PHP-8.2'
* PHP-8.2:
  Adapt ext/intl tests for ICU 72.1
2023-01-09 14:10:42 +01:00
Christoph M. Becker 435dc5ef1c Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Adapt ext/intl tests for ICU 72.1
2023-01-09 14:09:43 +01:00
Christoph M. Becker a9e7b90cc2 Adapt ext/intl tests for ICU 72.1
This version replaces SPACEs before the meridian with NARROW NO-BREAK
SPACEs.  Thus, we split the affected test cases as usual.

(cherry picked from commit 8dd51b462d)

Fixes GH-10262.
2023-01-09 14:08:40 +01:00
Dmitry Stogov ce861373b9 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)
2023-01-09 13:53:35 +03:00
Dmitry Stogov 9abc2108fa Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values)
2023-01-09 13:53:19 +03:00
Dmitry Stogov 4d4a53beee Fix incorrect optimization of ASSIGN_OP may lead to incorrect result (sub assign -> pre dec conversion for null values) 2023-01-09 13:51:57 +03:00
Dmitry Stogov f8b9312709 Merge branch 'PHP-8.2'
* PHP-8.2:
  ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
2023-01-09 09:51:12 +03:00
Dmitry Stogov d13b3b6aa7 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
2023-01-09 09:51:00 +03:00
Max Kellermann bcc5d268f6 ext/opcache/jit/zend_jit_trace: fix memory leak in _compile_root_trace() (#10146)
A copy of this piece of code exists in zend_jit_compile_side_trace(),
but there, the leak bug does not exist.

This bug exists since both copies of this piece of code were added in
commit 4bf2d09ede
2023-01-09 09:50:30 +03:00
Alex Dowad b4cbaabd9b Add fast SSE2-based implementation of mb_strlen for known-valid UTF-8 strings
One small piece of this was obtained from Stack Overflow. According to
Stack Overflow's Terms of Service, all user-contributed code on SO is
provided under a Creative Commons license. I believe this license is
compatible with the code being included in PHP.

Benchmarking results (UTF-8 only, for strings which have already been
checked using mb_check_encoding):

For very short (0-5 byte) strings, mb_strlen is 12% faster.
The speedup gets greater and greater on longer input strings; for
strings around 100KB, mb_strlen is 23 times faster.

Currently the 'fast' code is gated behind a GC flag check which ensures
it is only used on strings which have already been checked for UTF-8
validity. This is because the accelerated code will return different
results on some invalid UTF-8 strings.
2023-01-09 07:50:40 +02:00
Christoph M. Becker 60102c3228 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix recently introduced gh10251.phpt
2023-01-08 18:28:34 +01:00
Christoph M. Becker 6faeb9571d Fix recently introduced gh10251.phpt
As of PHP 8.2.0, creation of dynamic properties is deprecated, so we
slap a `AllowDynamicProperties` attribute on the class.
2023-01-08 18:07:21 +01:00
George Peter Banyard 3b8327a4e3 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
  Fix GH-9710: phpdbg memory leaks by option "-h"
2023-01-08 16:12:21 +00:00
George Peter Banyard e308dc0635 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
  Fix GH-9710: phpdbg memory leaks by option "-h"
2023-01-08 16:11:46 +00:00
Niels Dossche d03025bf59 Fix GH-10251: Assertion `(flag & (1<<3)) == 0' failed.
zend_get_property_guard previously assumed that at least "str" has a
pre-computed hash. This is not always the case, for example when a
string is created by bitwise operations, its hash is not set. Instead of
forcing a computation of the hashes, drop the hash comparison.

Closes GH-10254

Co-authored-by: Changochen <changochen1@gmail.com>

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-08 16:09:59 +00:00
Niels Dossche 8ff2b6abb2 Fix GH-9710: phpdbg memory leaks by option "-h"
Closes GH-10237

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-08 16:07:00 +00:00
Alex Dowad 092ad3e462 Optimize branch structure of UTF-8 decoder routine
I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!

(The asm is noticeably tighter, without any extra operations in the
path which dispatches to the code for decoding a 1-byte, 2-byte,
3-byte, or 4-byte character. It's just CMP, conditional jump, CMP,
conditional jump, CMP, conditional jump.

...Though I was admittedly impressed to see gcc could implement the
boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions:
add, CMP, then conditional jump. Pretty slick stuff there, guys.)

Benchmark results:

UTF-8, short - to UTF-16LE  faster by 7.36% (0.0001 vs 0.0002)
UTF-8, short - to UTF-16BE  faster by 6.24% (0.0001 vs 0.0002)
UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003)
UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003)
UTF-8, long - to UTF-16BE   faster by 1.02% (0.0215 vs 0.0217)
UTF-8, long - to UTF-16LE   faster by 1.01% (0.0209 vs 0.0211)
2023-01-08 17:27:19 +02:00
Alex Dowad d8b5b9fa55 Add unit tests for mb_str_split/mb_substr on MacJapanese encoding
MacJapanese has a somewhat unusual feature that when mapped to
Unicode, many characters map to sequences of several codepoints.
Add test cases demonstrating how mb_str_split and mb_substr behave in
this situation.

When adding these tests, I found the behavior of mb_substr was wrong
due to an inconsistency between the string "length" as measured by
mb_strlen and the number of native MacJapanese characters which
mb_substr would count when iterating over the string using the
mblen_table. This has been fixed.

I believe that mb_strstr will also return wrong results in some cases
for MacJapanese. I still need to come up with unit tests which
demonstrate the problem and figure out how to fix it.
2023-01-08 17:23:47 +02:00
Alex Dowad cca4ca6d3d Remove 'fast path' using mblen_table from mb_get_strlen (it's actually a slow path)
Various mbstring legacy text encodings have what is called an 'mblen_table';
a table which gives the length of a multi-byte character using a lookup on
the first byte value. Several mbstring functions have a 'fast path' which uses
this table when it is available.

However, it turns out that iterating through a string using the mblen_table
is surprisingly slow. I found that by deleting this 'fast path' from mb_strlen,
while mb_strlen becomes a few percent slower on very small strings (0-5 bytes),
very large performance gains can be achieved on medium to long input strings.

Part of the reason for this is because our text decoding filters are so much
faster now.

Here are some benchmarks:

    EUC-KR, short (0-5 chars)        - master faster by 11.90% (0.0000 vs 0.0000)
    EUC-JP, short (0-5 chars)        - master faster by 10.88% (0.0000 vs 0.0000)
    BIG-5, short (0-5 chars)         - master faster by 10.66% (0.0000 vs 0.0000)
    UTF-8, short (0-5 chars)         - master faster by 8.91% (0.0000 vs 0.0000)
    CP936, short (0-5 chars)         - master faster by 6.27% (0.0000 vs 0.0000)
    UHC, short (0-5 chars)           - master faster by 5.38% (0.0000 vs 0.0000)
    SJIS, short (0-5 chars)          - master faster by 5.20% (0.0000 vs 0.0000)

    UTF-8, medium (~100 chars)       - new faster by 127.51% (0.0004 vs 0.0002)
    UTF-8, long (~10000 chars)       - new faster by 87.94% (0.0319 vs 0.0170)
    UTF-8, very long (~100000 chars) - new faster by 88.25% (0.3199 vs 0.1699)

    SJIS, medium (~100 chars)        - new faster by 208.89% (0.0004 vs 0.0001)
    SJIS, long (~10000 chars)        - new faster by 253.57% (0.0319 vs 0.0090)

    CP936, medium (~100 chars)       - new faster by 126.08% (0.0004 vs 0.0002)
    CP936, long (~10000 chars)       - new faster by 200.48% (0.0319 vs 0.0106)

    EUC-KR, medium (~100 chars)      - new faster by 146.71% (0.0004 vs 0.0002)
    EUC-KR, long (~10000 chars)      - new faster by 212.05% (0.0319 vs 0.0102)

    EUC-JP, medium (~100 chars)      - new faster by 186.68% (0.0004 vs 0.0001)
    EUC-JP, long (~10000 chars)      - new faster by 295.37% (0.0320 vs 0.0081)

    BIG-5, medium (~100 chars)       - new faster by 173.07% (0.0004 vs 0.0001)
    BIG-5, long (~10000 chars)       - new faster by 269.19% (0.0319 vs 0.0086)

    UHC, medium (~100 chars)         - new faster by 196.99% (0.0004 vs 0.0001)
    UHC, long (~10000 chars)         - new faster by 256.39% (0.0323 vs 0.0091)

This does raise the question: is using the 'mblen_table' worthwhile for
other mbstring functions, such as mb_str_split? The answer is yes, it
is worthwhile; you see, while mb_strlen only needs to decode the input
string but not re-encode it, when mb_str_split is implemented using
the conversion filters, it needs to both decode the string and then
re-encode it. This means that there is more potential to gain
performance by using the 'mblen_table'. Benchmarking shows that in a
few cases, mb_str_split becomes faster when the 'mblen_table fast path'
is deleted, but in the majority of cases, it becomes slower.
2023-01-08 17:23:47 +02:00
Niels 58d741c042 Remove unnecessary NULL-checks on ctx (#10256)
ctx can never be zero in these functions because they are dispatched
virtually by looking up their entries in ctx. Furthermore, 2 of these
checks never actually worked because ctx was dereferenced before ctx was
NULL-checked.
2023-01-08 12:09:20 +01:00