1
0
mirror of https://github.com/php/php-src.git synced 2026-04-12 10:33:11 +02:00
Commit Graph

129384 Commits

Author SHA1 Message Date
Pierrick Charron
212ebeed68 Making PHP 8.2.0 beta3 php-8.2.0beta3 2022-08-16 11:23:58 -04:00
Alex Dowad
93207535fa Add test to exercise _php_mb_encoding_handler_ex with multiple possible input encodings
Thanks to Kamil Tekiela for pointing out that there was no test
case for this.
2022-08-16 16:43:38 +02:00
Alex Dowad
d617fcaae2 Fix legacy text conversion filter for 'HTML-ENTITIES'
Because this routine used a signed char buffer to hold the bytes
in a (possible) HTML entity, any bytes with the MSB set would
be sign-extended when converting to int; for example, 0x86 would
become 0xFFFFFF86 (or -121).

Codepoints with huge values, like 0xFFFFFF86, are not valid and
if any were passed to the output filter, it would treat them
as errors and emit error markers.
2022-08-16 16:43:27 +02:00
Alex Dowad
d9269becca Fix problems with ISO-2022-KR conversion
• The legacy conversion code did not emit an error marker if an
  escape sequence was truncated.

• BOTH old and new conversion code would shift from KSC5601
  (KS X 1001) mode to ASCII mode on an invalid escape sequence.
  This doesn't make any sense.
2022-08-16 16:43:27 +02:00
Alex Dowad
bfccdbd858 SJIS-Mobile#SOFTBANK string can end immediately after special escape sequence
SJIS-Mobile#SOFTBANK text encoding supports special escape sequences,
which shift the decoder into a mode where each single byte represents
an emoji. To get out of this mode, a 0xF (SHIFT OUT) byte can be
used.

After one of these special escape sequences, the new conversion
code expected to see at least one more byte. However, there doesn't
seem to be any particular reason why it should be treated as an
error condition if a string ends abruptly after one of these
escapes. Well, the escape sequence is useless in that case, but
it is a complete and valid escape sequence.

The legacy conversion code did allow a string to end immediately
after one of these escape sequences. Amend the new code to allow
the same.
2022-08-16 16:43:27 +02:00
Alex Dowad
983a29d3c0 Legacy conversion code for '7bit' to '8bit' inserts error markers
The use of a special 'vtbl' for converting between '7bit' and
'8bit' text meant that '7bit' text would not be converted to
wchars before going to '8bit'. This meant that the special
value MBFL_BAD_INPUT, which we use to flag an erroneous byte
sequence in input text (and which is required by functions
like mb_check_encoding), would pass directly to the output,
instead of being converted to the error marker specified
by mb_substitute_character.

This issue dates back to the time when I removed the mbfl
'identify filters' and made encoding validity checking and
encoding detection rely only on the conversion filters.
2022-08-16 16:43:27 +02:00
Alex Dowad
f3c8efd711 In legacy text conversion filters, reset filter state in 'flush' function
Up until now, I believed that mbstring had been designed such
that (legacy) text conversion filter objects should not be
re-used after the 'flush' function is called to complete a
text conversion operation.

However, it turns out that the implementation of
_php_mb_encoding_handler_ex DID re-use filter objects
after flush. That means that functions which were based on
_php_mb_encoding_handler_ex, including mb_parse_str and
php_mb_post_handler, would break in some cases; state left
over from converting one substring (perhaps a variable name)
would affect the results of converting another substring
(perhaps the value of the same variable), and could cause
extraneous characters to get inserted into the output.

All this code should be deleted soon, but fixing it helps me
to avoid spurious failures when fuzzing the new/old code to
look for differences in behavior.
2022-08-16 16:43:27 +02:00
Alex Dowad
18e526cb51 Fix legacy text conversion filter for SJIS-2004
EUC-JP-2004 includes special byte sequences starting with 0x8E
for kana. The legacy output routine for EUC-JP-2004 emits
these sequences if the value of the output variable `s` is
between 0x80 and 0xFF.

Since the same routine was also used for SJIS-2004 and
ISO-2022-JP-2004, before 8a915ed26c, the same 0x8E sequences
would be emitted when converting to those text encodings as well.
But that is completely wrong. 0x8E 0x__ does not mean the same
in SJIS-2004 or ISO-2022-JP-2004 as it does in EUC-JP-2004.

Therefore, in 8a915ed26c, I fixed the legacy conversion routine
by checking whether the output encoding is EUC-JP-2004 or not.
If it's not, and `s` is 0x80-0xFF, I made it emit an error.

Well, it turns out that single bytes with values from 0xA1
to 0xDF are meaningful in SJIS-2004. To emit these bytes when
appropriate, I had to amend the legacy conversion routine again.

(For clarity, this does NOT mean reverting to the behavior prior
to 8a915ed26c. We were right not to emit sequences starting with
0x8E in SJIS-2004. But in SJIS-2004, we *do* sometimes need to
emit single bytes from 0xA1-0xDF.)
2022-08-16 16:43:27 +02:00
Alex Dowad
3517a70f93 Fix legacy text conversion filter for CP50220
CP50220 converts some codepoints which represent kana
(hiragana/katakana) to a different form. This is the only difference
between CP50220 and CP50221 (which doesn't perform such conversion).
In some cases, this conversion means collapsing two codepoints to
a single output byte sequence. Since the legacy text conversion
filters only worked a byte at a time, the legacy filter had to
cache a byte, then wait until it was called again with the next
byte to compare the cached byte with the following one.

That was all fine, but it didn't work as intended when there were
errors (invalid byte sequences) in the input. Our code (both old
and new) for emitting error markers recursively calls the same
conversion filter. When the old CP50220 filter was called
recursively, the logic for managing cached bytes did not behave
as intended. As a result, the error markers could be reordered
with other characters in the output.

I used an ugly hack to fix this in 6938e3512; when making a
recursive call to emit an error marker, temporarily swap out
`filter->filter_function` to bypass the byte-caching code,
so the error marker immediately goes through to the output.

This worked, but I overlooked the fact that the very same
problem can occur if an invalid byte sequence is detected
*in the flush function*. Apply the same (ugly) fix.
2022-08-16 16:43:27 +02:00
Alex Dowad
4b370330d4 Ensure that Base64 output always wraps lines in the same manner as legacy implementation
The legacy Base64 conversion code in mbstring automatically
wrapped the output to 72 columns, and the new code imitates
this behavior. Frankly, I'm not sure if this is a good idea
or not (people could easily manually wrap it if they want to),
but have stuck with this behavior for backwards compatibility.

However, fuzzing revealed one case where we were not wrapping
to 72 columns; if the input string is not a multiple of 3
characters, meaning that the output must be padded, and the
point where we must add the final (padded) output happens to
be just beyond 72 columns.
2022-08-16 16:43:27 +02:00
Alex Dowad
c6bd08530e Adjust number of error markers emitted for truncated ISO-2022-JP escape sequence
Fuzzing revealed a small difference between the number of error
markers which the legacy ISO-2022-JP and JIS7/8 conversion code
emitted for truncated escape sequences and those emitted by the
new code. The behavior of the old code seems more reasonable
here, so we will imitate it.
2022-08-16 16:43:27 +02:00
Alex Dowad
128768a450 Adjust number of error markers emitted for truncated UTF-8 code units
In 04e59c916f, I amended the UTF-8 conversion code, so that when given
invalid input, it would emit a number of errors markers harmonizing
with the WHATWG's specification of the standard UTF-8 decoding
algorithm. (Which, gentle reader of commit logs, you can find online
at https://encoding.spec.whatwg.org/#utf-8-decoder.) However, the code
in 04e59c916f was faulty in the case that a truncated UTF-8 code unit
starts with 0xF1.

Then, in dc1ba61d09, when making a small refactoring to a different
part of the UTF-8 conversion code, I inexplicably broke part of the
working code, causing the same fault which was already present with
truncated UTF-8 code units starting with 0xF1 to also occur with
0xF2 and 0xF3 as well. I don't remember what inane thoughts I was
thinking when I pulled off this feat of utter mental confusion.

None of these cases were covered by unit tests, by the way.

Thankfully, my trusty fuzzer picked up on this when testing the
new implementation of mb_parse_str (since the legacy UTF-8
conversion filter did not suffer from the same problem, and I was
fuzzing to find any differences in behavior between the old and
new implementations).

Fortuitously, the fuzzer also picked up another issue which was
present in 04e59c916f. I was emitting only one error marker for
truncated code units starting with 0xE0 or 0xED, in cases where
the WHATWG standard indicates two should be emitted. Examples
are 0xE0 0x9F <END OF STRING> or 0xED 0xA0 <END OF STRING>.

Code units starting with 0xE0-0xED should have 3 bytes. If the
first byte is 0xE0, the second MUST be 0xA0 or greater. (Otherwise,
the codepoint could have fit in a two-byte code unit.) And if the
first byte is 0xED, the second MUST be 0x9F or less. According to
the WHATWG algorithm, step 4, if the second byte is outside the
legal range, then the decoder should emit an error... AND
reprocess the out-of-range byte. The reprocessing will then
cause another error. That's why the decoder should indicate two
errors and not one.
2022-08-16 16:43:27 +02:00
Alex Dowad
a4656895dd Imitate legacy behavior when converting non-encodings using mbstring
Fuzzing revealed that something was missed here when making the new
encoding conversion code match the behavior of the old code. In the
next major release of PHP, support for these non-encodings will be
dropped, but in the meantime, it is better to match the legacy
behavior.
2022-08-16 16:43:27 +02:00
Alex Dowad
88d13491de Make control flow in mb_wchar_to_cp50220 a bit clearer 2022-08-16 16:43:26 +02:00
Alex Dowad
8df515555b Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
Alex Dowad
aeccb139c3 Use new encoding conversion filters for mb_parse_str and php_mb_post_handler
When micro-benchmarking on relatively short ASCII strings, the new
implementation was about 30% faster than the old one.
2022-08-16 16:43:26 +02:00
Máté Kocsis
98e5c4e3a3 Declare ext/sockets constants in stubs (#9349) 2022-08-16 13:18:31 +02:00
Máté Kocsis
0c4c9fb93b Declare et/zip constants in stubs (#9146) 2022-08-16 10:18:24 +02:00
David CARLIER
8e2d4e6b52 random left rotates annotating as const. (#9346) 2022-08-16 05:31:58 +01:00
David Carlier
db64c1cb70 zend introduce const GNUC attribute. sub optimisation where there is no pointers, nor particular memory layout, thread local/volatile ... involved. usage concealed for now into little pack helpers.
Closes #9326.
2022-08-15 19:49:24 +01:00
Christoph M. Becker
ac9cbb7174 Merge branch 'PHP-8.1'
* PHP-8.1:
  Correct IntlDateFormatter::formatObject params
2022-08-15 18:10:54 +02:00
twosee
fa83e37e73 [ci skip] Add missing NEWS entry for GH-9324 2022-08-15 23:59:58 +08:00
Christoph M. Becker
306da80f56 Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Correct IntlDateFormatter::formatObject params
2022-08-15 17:58:52 +02:00
Gert de Pagter
05ed47ef12 Correct IntlDateFormatter::formatObject params
Closes GH-9341.
2022-08-15 17:56:34 +02:00
twosee
7eba683e2c Fix unexpected deprecated dynamic property warning (#9324)
Occurred when exit() with uncaught exception in finally block.
2022-08-15 20:54:39 +08:00
Ilija Tovilo
0f29436a2f Remove useless UNEXPECTED around RETURN_VALUE_USED in specialized RETVAL handler (#9329)
This can lead to funny code like UNEXPECTED(1) which is non-sensical.
2022-08-15 14:21:02 +02:00
Tim Düsterhus
3b48a2044d Replace RuntimeException in Randomizer::nextInt() by RandomException (#9305)
* Replace RuntimeException in Randomizer::nextInt() by RandomException

* Add ext/random/tests/03_randomizer/nextint_error.phpt
2022-08-15 11:06:03 +02:00
Jakub Zelenka
4b7a98754d Merge branch 'PHP-8.1'
[skip-ci]
2022-08-14 14:19:28 +01:00
Jakub Zelenka
7f64a8d59f [skip ci] Add missing NEWS entry for GH-8409 fix to PHP-8.1 branch (skip-ci) 2022-08-14 14:18:28 +01:00
twosee
ef39adb638 Merge branch 'PHP-8.1'
* PHP-8.1:
  Re-fix GH-8409: SSL handshake timeout persistent connections hanging
  Revert "Fix GH-8409: SSL handshake timeout persistent connections hanging"
2022-08-14 20:15:35 +08:00
twosee
14d71957ca Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Re-fix GH-8409: SSL handshake timeout persistent connections hanging
2022-08-14 20:14:57 +08:00
twosee
b8d07451d4 Re-fix GH-8409: SSL handshake timeout persistent connections hanging
This fix is another solution to replace d0527427be, use zend_try and zend_catch to make sure persistent stream will be released when error occurred.

Closes GH-9332.
2022-08-14 20:13:36 +08:00
Jakub Zelenka
897ca85d33 Revert "Fix GH-8409: SSL handshake timeout persistent connections hanging"
This reverts commit d0527427be.

This patch makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, EG(record_errors) assertion will always fail, and zend_begin_record_errors() was only used during compile time before.
Note: zend_emit_recorded_errors() and the typo fix are reserved.
2022-08-14 19:41:06 +08:00
Tim Düsterhus
b825756317 Update expires format for session cookie (#9304)
* Update expires format for session cookie

see GH-9200
see 15e3fcb468

* Add ext/session/tests/gh9200.phpt
2022-08-12 19:52:04 +02:00
Jakub Zelenka
438f692e92 Merge branch 'PHP-8.1' 2022-08-12 17:12:28 +01:00
Jakub Zelenka
d0527427be Fix GH-8409: SSL handshake timeout persistent connections hanging
This is not actually related to SSL handshake but stream socket creation
which does not clean errors if the error handler is set. This fix
prevents emitting errors until the stream is freed.
2022-08-12 17:09:24 +01:00
Christoph M. Becker
e885831670 Merge branch 'PHP-8.1'
* PHP-8.1:
  Fix GH-9309: Segfault when connection is used after imap_close()
2022-08-12 16:25:46 +02:00
Christoph M. Becker
71c22efae7 Fix GH-9309: Segfault when connection is used after imap_close()
We actually need to check whether `php_imap_object.imap_stream` is
`NULL` to detect that the connection has already been closed.

Closes GH-9313.
2022-08-12 16:24:30 +02:00
David CARLIER
393577ced9 reallocarray using proper inline facility to check overflow on windows. (#9300) 2022-08-12 12:08:03 +01:00
Ilija Tovilo
c809a213f2 Fix run-tests.php --no-progress flag for non-parallel testing 2022-08-12 12:51:13 +02:00
Ilija Tovilo
1f6baa776b Show function name when dumping fake closure (#9306)
Fixes GH-8962
2022-08-12 12:22:55 +02:00
Ilija Tovilo
98bdb7f99b Make pestr[n]dup infallible (#9295)
Fixes GH-9128
Closes GH-9295
2022-08-12 12:21:14 +02:00
Christoph M. Becker
1094a859ad Merge branch 'PHP-8.1'
* PHP-8.1:
  Fix GH-9296: `ksort` behaves incorrectly on arrays with mixed keys
2022-08-12 11:38:21 +02:00
Christoph M. Becker
7908aae30c Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Fix GH-9296: `ksort` behaves incorrectly on arrays with mixed keys
2022-08-12 11:36:24 +02:00
Denis Vaksman
cd1aed8edd Fix GH-9296: ksort behaves incorrectly on arrays with mixed keys
The comparator function used at ksort in SORT_REGULAR mode
need to be consistent with basic comparison rules. These rules
were changed in PHP-8.0 for numeric strings, but comparator
used at ksort kept the old behaviour. It leads to inconsistent
situations, when after ksort the first key is GREATER than some
of the next ones by according to the basic comparison operators.

Closes GH-9293.
2022-08-12 11:32:23 +02:00
Ilija Tovilo
0028c242f0 Add --[no-]progress option to run-tests.php (#9255)
Previously, adding the -g argument would disable progress, even locally.
Now it needs to be disabled explicitly.
2022-08-11 20:58:15 +02:00
Derick Rethans
0fc9fd9fd6 Merged pull request #9297 2022-08-11 16:27:28 +01:00
Derick Rethans
a6a5d46704 Simplify and move check for too high expiry time, which you can't reach on 32bit systems 2022-08-11 16:27:25 +01:00
Derick Rethans
15e3fcb468 Fixed GH-9200: setcookie has an obsolete expires date format 2022-08-11 16:27:25 +01:00
Derick Rethans
9dc6ee995f Merge branch 'PHP-8.1' 2022-08-11 16:26:46 +01:00