1
0
mirror of https://github.com/php/php-src.git synced 2026-04-17 21:11:02 +02:00
Commit Graph

63190 Commits

Author SHA1 Message Date
Jorg Adam Sowa
77ee92a50c Remove unnecessary usage of CONST_CS
Closes GH-9685.
2022-11-28 17:12:07 +01:00
Thomas PIRAS
289822d3ad Add a proper error message for ffi load
We call dlerror when a library failed to load properly.

Closes GH-9913.
2022-11-28 16:19:54 +01:00
Alex Dowad
0109aa62ec Simplify decoding filter for UTF-8
When decoding a 3-byte UTF-8 code unit, redundant checks for overlong
code unit and for illegal codepoints from U+D800-DFFF were included.
Both of these conditions are caught by the line which reads:

    if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {

As such, there is no reason to check for the same error conditions again.

Likewise, when decoding a 4-byte UTF-8 code unit, there was a
redundant check for overlong code unit. That was already caught by the
line which reads:

    if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {
2022-11-28 17:04:00 +02:00
Tim Düsterhus
50e32015ae Merge branch 'PHP-8.2'
* PHP-8.2:
  [ci skip] random: Trim trailing whitespace in randomizer.c
2022-11-27 19:11:32 +01:00
Tim Düsterhus
350883db06 [ci skip] random: Trim trailing whitespace in randomizer.c
To keep the diff cleaner for future changes, such as #9664.
2022-11-27 19:10:45 +01:00
Jakub Zelenka
eb83e0206c Merge branch 'PHP-8.2' 2022-11-25 14:08:17 +00:00
Jakub Zelenka
c8d8bf7c59 Merge branch 'PHP-8.1' into PHP-8.2 2022-11-25 14:07:41 +00:00
Jakub Zelenka
500b28ad04 Fix GH-10000: Test failures when OpenSSL compiled with no-dsa 2022-11-25 14:02:03 +00:00
Arnaud Le Blanc
1cba98ebe9 Merge branch 'PHP-8.2'
* PHP-8.2:
  [ci skip] NEWS
  [ci skip] NEWS
  Do not resolve constants on non-linked class during preloading (#9975)
2022-11-25 14:37:55 +01:00
Arnaud Le Blanc
5563535e97 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  [ci skip] NEWS
  Do not resolve constants on non-linked class during preloading (#9975)
2022-11-25 14:11:52 +01:00
Arnaud Le Blanc
91b3b58f71 Do not resolve constants on non-linked class during preloading (#9975)
Fixes GH-9968
2022-11-25 14:02:45 +01:00
Jakub Zelenka
d526773d20 Merge branch 'PHP-8.2' 2022-11-25 12:51:23 +00:00
Jakub Zelenka
c022ce92fb Merge branch 'PHP-8.1' into PHP-8.2 2022-11-25 12:50:38 +00:00
Jakub Zelenka
ce57221376 Fix GH-9064: PHP fails to build if openssl was built with no-ec 2022-11-25 12:49:12 +00:00
Jakub Zelenka
ce58ae5e79 Merge branch 'PHP-8.2' 2022-11-24 18:30:57 +00:00
Jakub Zelenka
3d90a24e93 Fix GH-9997: OpenSSL engine clean up segfault 2022-11-24 18:29:44 +00:00
George Peter Banyard
32d3cae19f Handle trampolines correctly in new FCC API + usages (#9877) 2022-11-22 17:12:53 +00:00
Alex Dowad
0e540ed739 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix mangled kana output for JIS encoding
2022-11-22 15:50:43 +02:00
Alex Dowad
8f84192403 Fix mangled kana output for JIS encoding
For JIS encoding, hiragana and katakana can be input in multiple forms.
One form uses JISX 0201 escape sequences. Another is called 'GR-invoked'
kana.

In the context of ISO-2022 encoding, bytes with a zero bit in the MSB
are called "GL" (or "graphics left") and those with the MSB set are
called "GR" (or "graphics right"). Regarding the variants of
ISO-2022-JP which are called "JIS7" and "JIS8", Wikipedia states:

"Other, older variants known as JIS7 and JIS8 build directly on the
7-bit and 8-bit encodings defined by JIS X 0201 and allow use of JIS X
0201 kana from G1 without escape sequences, using Shift Out and Shift
In or setting the eighth bit (GR-invoked), respectively."

In harmony with this, we have always accepted bytes from 0xA3-0xDF and
decoded them to the corresponding hiragana/katakana. However, at some
point I accidentally broke output for these kana. You can see the
problem in 3v4l.org by running this program:

    <?php
    echo bin2hex(mb_convert_encoding("\xA3", 'JIS', 'JIS'));

The results are:

    Output for 8.2rc1 - rc3
    1b244200231b2842
    Output for 7.4.0 - 7.4.33, 8.0.1 - 8.0.25, 8.1.12
    1b2849231b2842
    Output for 8.1.0 - 8.1.11
    1b284923

You can see that from 8.1.0 - 8.1.11, there was a missing escape
sequence at the end. That was caused because the flush functions were
not being called properly, and has already been fixed. However, this
also shows that the output for 8.2rc1-rc3 is completely invalid.
It is trying to output a JISX 0208 sequence, but with 0x00 as one of
the JISX 0208 bytes, which is illegal.

Add the missing code which will make the new text conversion filters
behave the same as the old ones when outputting hiragana/katakana in
JIS encoding.
2022-11-22 15:49:19 +02:00
George Peter Banyard
b97d2b6dae Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-9883  SplFileObject::__toString() reads next line
2022-11-22 12:26:48 +00:00
George Peter Banyard
6e87485d3c Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-9883  SplFileObject::__toString() reads next line
2022-11-22 12:26:03 +00:00
George Peter Banyard
6fbf81c674 Fix GH-9883 SplFileObject::__toString() reads next line
We need to overwrite the __toString magic method for SplFileObject, similarly to how DirectoryIterator overwrites it
Moreover, the custom cast handler is useless as we define __toString methods, so use the standard one instead.

Closes GH-9912
2022-11-22 12:21:14 +00:00
Dmitry Stogov
ff85649431 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix a memory leak in tracig JIT when the same closure is called through Closure::call() and natively.
2022-11-22 12:29:36 +03:00
Dmitry Stogov
f97f805275 Merge branch 'closure_call' into PHP-8.2
* closure_call:
  Fix a memory leak in tracig JIT when the same closure is called through Closure::call() and natively.
2022-11-22 12:29:20 +03:00
Dmitry Stogov
45cb3f917a Fix a memory leak in tracig JIT when the same closure is called through Closure::call() and natively.
Closure::call() makes a temporary copy of original closure function, modifies its
scope, resets ZEND_ACC_CLOSURE flag and call it through zend_call_function().
As result the same function may be called with and without
ZEND_ACC_CLOSURE flag, that confuses JIT and may lead to memory leak or
even worse memory errors.

The patch allocates "fake" closure object and keep ZEND_ACC_CLOSURE flag
to always behave in the same way.
2022-11-21 17:41:16 +03:00
Alex Dowad
3e743e9ba1 Merge branch 'PHP-8.2'
* PHP-8.2:
  For UTF-7, flag unnecessary extra trailing byte in Base64 section as error
2022-11-21 14:49:55 +02:00
Alex Dowad
a618682373 For UTF-7, flag unnecessary extra trailing byte in Base64 section as error
This bug was found when I was fuzzing a patch related to mb_strpos.
In some cases, the legacy text conversion code for UTF-7 (and
UTF7-IMAP) would correctly recognize an error for a Base64-encoded
section which was not correctly padded with zero bits, but the new
(and faster) text conversion code would not.

Specifically, if the input string ended abruptly after the 4th or 7th
byte of a Base64-encoded section, the new conversion code would
confirm that the trailing padding bits from the previous byte (3rd or
6th) were zeroes, but would not check whether the 4th or 7th byte
itself encoded any non-zero bits. The legacy conversion code did
perform this check and would treat the input string as invalid.

Actually, even if the 4th or 7th byte does encode only (padding) zero
bits, this is still a problem, because there is no reason to have a
4th (or 7th) byte in that case. The UTF-7 string should have ended
on the previous byte instead.

Apply the same fix for both UTF-7 and UTF7-IMAP.
2022-11-21 14:49:01 +02:00
David CARLIER
3660bc31de opcache fixing w/x pages creation on freebsd 13.1 and above.
By default, the system allows these but admin can disable them system wide.
However the procctl api permits to control it per process.

Closes GH-9896.
2022-11-18 19:22:00 +00:00
Alex Dowad
b1954f5fd6 Use fast text conversion filters to implement mb_convert_variables 2022-11-18 10:19:07 +02:00
Ilija Tovilo
adfdfb2e1e Improvements in modifier parsing (#9926)
Use a shared non-terminal for all class modifiers. This avoids conflicts when
adding modifiers that are only valid for certain targets. This change is
necessary for asymmetric visibility but might be useful for other future
additions.

Closes GH-9926
2022-11-17 16:20:27 +01:00
George Peter Banyard
bab9e349cb Change conditional check in disk_free_space() test
As the notion of free space is fuzzy on some filesystems (such as BTRFS)
we check that the disk space after adding a file is less or equal than before

This closes Bug #80629
2022-11-16 12:58:24 +00:00
George Peter Banyard
dbf54e1a8b Use zend_result return type instead of innacurate ones 2022-11-16 12:58:24 +00:00
George Peter Banyard
726d595ec7 Remove code for OS2
The last release of OS2 was in December 2021, so 20 years ago. Remove this effectively dead code
2022-11-16 12:58:24 +00:00
Tim Düsterhus
dd8de1e726 Promote unserialize() notices to warning (#9629)
* Unserialize: Migrate "Unexpected end of serialized data" to E_WARNING

* Unserialize: Migrate "Error at offset %d of %d bytes" to E_WARNING

* Unserialize: Migrate "%s is returned from __sleep() multiple times" to E_WARNING

* Add NEWS for “Promote unserialize() notices to warning”
2022-11-15 19:36:38 +01:00
Alex Dowad
d0d834429f Cache UTF-8-validity status of strings in GC flags
The PCRE extension is already doing this. The flag is set when a string
is determined to be valid UTF-8, and cleared in
zend_string_forget_hash_val.

We might as well make good use of it in mbstring as well.

This should result in a negligible slowdown for non-UTF-8 strings,
bad UTF-8 strings, and good UTF-8 strings which are checked only once.
However, when microbenchmarking this change using a variety of text
encodings and string lengths, I found that in most of these cases,
the 'new' code was a few percent faster. In a couple of cases, the 'old'
code was a few percent faster. This was not a result of sampling error,
since I could reproduce these test results repeatedly, and even when
running a large number of iterations. Both the new and old code
were compiled with -O3 -march=native. My (unproved) hypothesis is that
although the new code appears to only add one more conditional branch,
the compiler may emit slightly different code from before (perhaps
with different register allocation and so on), and this may cause some
cases to run slightly faster and others to run slightly slower. I have
not disassembled the old and new binaries to see if an examination of
the emitted assembly code would support this hypothesis.

For good UTF-8 strings which are checked repeatedly, the speedup is
about 40% even for strings 1-5 bytes in length. For ~100 byte strings,
it is ~700%, and for ~10000 byte strings, it is ~80000%.

I tried fuzzing MBString's php_mb_check_encoding function and
pcre2lib's valid_utf function to see if I could find any cases where
their output would be different. After running the fuzzer for a couple
of minutes, it had tried more than 1 million test cases without finding
any where the output was different. Therefore, it appears that
MBString's UTF-8 validation is compatible with PCRE's.
2022-11-15 19:14:35 +02:00
David Carlier
da47547809 Merge branch 'PHP-8.2' 2022-11-15 12:31:33 +00:00
David Carlier
65782fbbe8 Merge branch 'PHP-8.1' into PHP-8.2 2022-11-15 12:30:04 +00:00
David Carlier
a4298c14c1 Fix GH-9932: Discards further characters for session name.
As those are converted, it s better to make aware of the code caller of the naming inadequacy.
Closes GH-9940.
2022-11-15 12:27:44 +00:00
Alex Dowad
0ff9df9677 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix regression test for GH-9535 on PHP-8.2+
2022-11-14 11:47:06 +02:00
Alex Dowad
d3933e0b6c Fix regression test for GH-9535 on PHP-8.2+
Some of the legacy text encodings which were used in this regression
test are deprecated in PHP-8.2+. The deprecation warnings break the
expected output. Since using these encodings in mbstring is now
deprecated, I think there is little point in keeping them in this test.
So they are now removed from it.

Further, in 219fff376b, I made a change to avoid a situation where the
legacy UTF7-IMAP conversion code gets stuck in a wrong state when its
attempt to emit a character fails. When a Base64-encoded section of
input ended with -, the previous code would FIRST emit a character if
necessary (using the CK or "check" macro, which causes the function to
return immediately if the downstream filter function returns an error
code), and THEN update its own state to indicate that it is now in
ASCII rather than Base64 mode.

If the downstream filter function returned an error code, the CK macro
would then cause the UTF7-IMAP filter function to return immediately
WITHOUT setting its own state to indicate that the Base64-encoded
section was done.

I fixed this by updating the filter state as needed BEFORE calling CK...
but I missed updating the filter state in the case where the Base64
section ends normally and there is no need to emit anything.

Again, in 6d525a425e, I modified the legacy conversion code for
ISO-2022-KR to try to comply more closely with the RFC for this
text encoding. The RFC states that before any occurrence of 'Shift In'
or 'Shift Out' codes in a ISO-2022-KR string, a special escape
sequence must appear at least ONCE, at the beginning of a line.
The previous code did not comply with this requirement. I made it
comply by always emitting this escape sequence at the beginning of
the first line.

Since mb_strcut (wrongly) determines when it has consumed enough of
the input string by looking at the length of its output in bytes, this
extra escape sequence makes mb_strcut consume 4 bytes less of an
ISO-2022-KR string than would otherwise be the case. When this
strange behavior of mb_strcut is fixed, this test will have to be
adjusted to restore the previous expected outputs for ISO-2022-KR.
2022-11-14 11:46:12 +02:00
Dmitry Stogov
a6a80d8ab2 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix memory leak
2022-11-14 12:36:00 +03:00
Dmitry Stogov
6cbc91151a Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix memory leak
2022-11-14 12:35:49 +03:00
Dmitry Stogov
a8bd342397 Fix memory leak
Fizes oss-fuzz #53143
2022-11-14 12:35:09 +03:00
Alex Dowad
50f87d36e0 Merge branch 'PHP-8.2'
* PHP-8.2:
  [ci skip] NEWS
  Fix GH-9535 (unintended behavior change for mb_strcut in PHP 8.1)
2022-11-13 14:44:04 +02:00
Alex Dowad
79ae3090e0 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  [ci skip] NEWS
  Fix GH-9535 (unintended behavior change for mb_strcut in PHP 8.1)
2022-11-13 14:42:57 +02:00
NathanFreeman
fa0401b0b5 Fix GH-9535 (unintended behavior change for mb_strcut in PHP 8.1)
The existing implementation of mb_strcut extracts part of a
multi-byte encoded string by pulling out raw bytes and then running
them through a conversion filter to ensure that the output is valid
in the requested encoding.

If the conversion filter emits error markers when doing the final
'flush' operation which ends the conversion of the extracted bytes,
these error markers may (in some cases) be included in the output.
The conversion operation does not respect the value of
mb_substitute_character; rather, it always uses '?' as an error marker.
So this issue manifests itself as unwanted '?' characters being
inserted into the output.

This issue has existed for a long time, but became noticeable in PHP
8.1 because for at least some of the supported text encodings, mbstring
is now more strict about emitting error markers when strings end in an
illegal state.

The simplest fix is to suppress error markers during the final flush
operation.

While working on a fix for this problem, another problem with mb_strcut
was discovered; since it decides when to stop consuming bytes from
the input by looking at the byte length of its OUTPUT, anything which
causes extra bytes to be emitted to the output may cause mb_strcut to
not consume all the bytes in the requested range.

The one case where we DO emit extra output bytes is for encodings
which have a selectable mode, like ISO-2022-JP; if a string in such
an encoding ends in a mode which is not the default, we emit an ending
escape sequence which changes back to the default mode. This is done
so that concatenating strings in such encodings is safe.

However, as mentioned, this can cause the output of mb_strcut to be
shorter than it logically should be. This bug has existed for a long
time, and fixing it now will be a BC break, so we may not fix it right
away.

Therefore, tests for THIS fix which don't pass because of that OTHER
bug have been split out into a separate test file (gh9535b.phpt), and
that file has been marked XFAIL.
2022-11-13 14:37:55 +02:00
Arnaud Le Blanc
9575968acc Merge branch 'PHP-8.2'
* PHP-8.2:
  [ci skip] NEWS
  [ci skip] NEWS
  Fix GH-9298: remove all registered signal handlers in pcntl RSHUTDOWN
2022-11-13 11:10:10 +01:00
Arnaud Le Blanc
d8fc1af809 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  [ci skip] NEWS
  Fix GH-9298: remove all registered signal handlers in pcntl RSHUTDOWN
2022-11-13 11:05:28 +01:00
Erki Aring
5ecbb1b39d Fix GH-9298: remove all registered signal handlers in pcntl RSHUTDOWN 2022-11-13 10:57:58 +01:00
David Carlier
e0e347b4a8 Fix GH-9923: Add the SIGINFO constant in pcntl for system supporting it.
Closes #9938
2022-11-12 19:37:32 +00:00