Commit 6c25413 added the flag ZEND_JIT_EXIT_INVALIDATE which resets
the trace handlers in zend_jit_trace_exit(), but forgot to consider
that on ZEND_JIT_TRACE_STOP_LINK, this changed handler gets passed to
zend_jit_find_trace(), causing it to fail, either by returning 0
(results in bogus data) or by aborting due to ZEND_UNREACHABLE(). In
either case, this crashes the PHP process.
I'm not quite sure how to fix this multi-threading problem properly;
my suggestion is to just fail the zend_jit_trace() call. After all,
the whole ZEND_JIT_EXIT_INVALIDATE fix was about reloading modified
scripts, so there's probably no point in this pending zend_jit_trace()
call.
having tigher control on ACK delays, difference is the setting
is `volatile` as it can be turned off by the kernel if not set
explicitally set otherwise on the socket.
Closes GH-10145.
I way want to confirm different on mbstring PHP 8.1 or newer and
PHP 8.0 or older, but when I port to PHP 8.0 from PHP 8.1 or newer
phpt files, it stopped die() function when test failed. I want to
make a list, so I don't want to stop it.
If you execute full test, set $testFailedLimit to -1 in
encoding_tests.inc.
In GitHub issue 9613, it was reported that mb_strpos wrongly matches the
character '?' against any invalid string, even when the character '?'
clearly does not appear in the invalid string. This behavior has existed
at least since PHP 5.2.
The reason for the behavior is that mb_strpos internally converts the
haystack and needle to UTF-8 before performing a search. When converting
to UTF-8, regardless of the setting of mb_substitute_character, libmbfl
would use '?' as an error marker for invalid byte sequences. Once those
invalid input sequences were replaced with '?', then naturally, they
would match against occurrences of the actual character '?' (when it
appeared as a 'normal' character, not as an error marker). This would
happen regardless of whether the error was in the haystack and '?' was
used in the needle, or whether the error was in the needle and '?' was
used in the haystack.
Why would libmbfl use '?' rather than the mb_substitute_character set
by the user? Remember that libmbfl was originally a separate library
which was imported into the PHP codebase. mb_substitute_character is an
mbstring API function, not something built into libmbfl. When mbstring
would call into libmbfl, it would provide the error replacement
character to libmbfl as a parameter. However, when libmbfl would perform
conversion operations internally, and not because of a direct call from
mbstring, it would use its own error replacement character.
Example:
<?php
$questionMark = "\x00?";
$badUTF16 = "\xDB\x00"; // half of a surrogate pair
echo mb_strpos($questionMark, $badUTF16, 0, 'UTF-16BE'), "\n";
echo mb_strpos($badUTF16, $questionMark, 0, 'UTF-16BE'), "\n";
Incidentally, this behavior does not occur if the text encoding is
UTF-8, because no conversion is needed in that case.
mb_stripos had a similar issue, but instead of always using '?' as an
error marker internally, it would use the selected
mb_substitute_character. So, for example, if the mb_substitute_character
was '%', then occurrences of '%' in the haystack would match invalid
bytes in the needle, and vice versa.
Example:
<?php
mb_substitute_character(0x25); // '%'
$percent = "\x00%";
$badUTF16 = "\xDB\x00"; // half of a surrogate pair
echo mb_stripos($percent, $badUTF16, 0, 'UTF-16BE'), "\n";
echo mb_stripos($badUTF16, $percent, 0, 'UTF-16BE'), "\n";
This behavior (of mb_stripos) still occurs even if the text encoding is
UTF-8, because case folding is still needed to make the search
case-insensitive.
It is not hard to think of scenarios where these strange and unintuitive
behaviors could cause security vulnerabilities. In the discussion on
GH issue 9613, Christoph Becker suggested that mb_str{i,}pos should
simply refuse to operate on invalid strings. However, this would almost
certainly break existing production code.
This commit mitigates the problem in a less intrusive way: it ensures
that while invalid haystacks can match invalid needles (even if the
specific invalid bytes are different), invalid bytes in the haystack
will never match '?' OR occurrences of the mb_substitute_character in
the needle, and vice versa.
This does represent a backwards compatibility break, but a small one.
Since it mitigates a potential security problem, I believe this is
appropriate.
Closes GH-9613.
Instead of case-folding a string and then converting it to UTF-8 as a
separate operation, why not convert it to UTF-8 at the same time as
we fold case?
For non-UTF-8 encodings, this typically makes mb_stripos about 2x
faster.
The performance gain from this change depends on the text encoding and
input string size. For very small strings, other overheads tend to swamp
the performance gains to some extent, such that the speedup is less than
2x. For medium-length strings (~100 bytes or so), the speedup is
typically around 2.5x.
The greatest performance gains are for UTF-8 strings which have already
been marked as valid (using the GC flags on the zend_string object);
for those, the speedup is more than 10x in many cases.
The previous implementation first converted the haystack and needle to
wchars, then searched for matches between the two sequences of wchars.
Because we use -1 as an error marker when converting to wchars, error
markers from invalid byte sequences in the haystack would match error
markers from invalid byte sequences in the needle, even if the specific
invalid byte sequence was different. I am not sure whether this behavior
is really desirable or not, but anyways, this new implementation
follows the same behavior so as not to cause BC breaks.
* random: Add Randomizer::nextFloat()
* random: Check that doubles are IEEE-754 in Randomizer::nextFloat()
* random: Add Randomizer::nextFloat() tests
* random: Add Randomizer::getFloat() implementing the y-section algorithm
The algorithm is published in:
Drawing Random Floating-Point Numbers from an Interval. Frédéric
Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022.
https://doi.org/10.1145/3503512
* random: Implement getFloat_gamma() optimization
see https://github.com/php/php-src/pull/9679/files#r994668327
* random: Add Random\IntervalBoundary
* random: Split the implementation of γ-section into its own file
* random: Add tests for Randomizer::getFloat()
* random: Fix γ-section for 32-bit systems
* random: Replace check for __STDC_IEC_559__ by compile-time check for DBL_MANT_DIG
* random: Drop nextFloat_spacing.phpt
* random: Optimize Randomizer::getFloat() implementation
* random: Reject non-finite parameters in Randomizer::getFloat()
* random: Add NEWS/UPGRADING for Randomizer’s float functionality