* random: Randomizer::getFloat(): Fix check for empty open intervals
The check for invalid parameters for the IntervalBoundary::OpenOpen variant was
not correct: If two consecutive doubles are passed as parameters, the resulting
interval is empty, resulting in an uint64 underflow in the γ-section
implementation.
Instead of checking whether `$min < $max`, we must check that there is at least
one more double between `$min` and `$max`, i.e. it must hold that:
nextafter($min, $max) != $max
Instead of duplicating the comparatively complicated and expensive `nextafter`
logic for a rare error case we instead return `NAN` from the γ-section
implementation when the parameters result in an empty interval and thus underflow.
This allows us to reliably detect this specific error case *after* the fact,
but without modifying the engine state. It also provides reliable error
reporting for other internal functions that might use the γ-section
implementation.
* random: γ-section: Also check that that min is smaller than max
This extends the empty-interval check in the γ-section implementation with a
check that min is actually the smaller of the two parameters.
* random: Use PHP_FLOAT_EPSILON in getFloat_error.phpt
Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
This version replaces SPACEs before the meridian with NARROW NO-BREAK
SPACEs. Thus, we split the affected test cases as usual.
(cherry picked from commit 8dd51b462d)
Fixes GH-10262.
A copy of this piece of code exists in zend_jit_compile_side_trace(),
but there, the leak bug does not exist.
This bug exists since both copies of this piece of code were added in
commit 4bf2d09ede
One small piece of this was obtained from Stack Overflow. According to
Stack Overflow's Terms of Service, all user-contributed code on SO is
provided under a Creative Commons license. I believe this license is
compatible with the code being included in PHP.
Benchmarking results (UTF-8 only, for strings which have already been
checked using mb_check_encoding):
For very short (0-5 byte) strings, mb_strlen is 12% faster.
The speedup gets greater and greater on longer input strings; for
strings around 100KB, mb_strlen is 23 times faster.
Currently the 'fast' code is gated behind a GC flag check which ensures
it is only used on strings which have already been checked for UTF-8
validity. This is because the accelerated code will return different
results on some invalid UTF-8 strings.
zend_get_property_guard previously assumed that at least "str" has a
pre-computed hash. This is not always the case, for example when a
string is created by bitwise operations, its hash is not set. Instead of
forcing a computation of the hashes, drop the hash comparison.
Closes GH-10254
Co-authored-by: Changochen <changochen1@gmail.com>
Signed-off-by: George Peter Banyard <girgias@php.net>
I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!
(The asm is noticeably tighter, without any extra operations in the
path which dispatches to the code for decoding a 1-byte, 2-byte,
3-byte, or 4-byte character. It's just CMP, conditional jump, CMP,
conditional jump, CMP, conditional jump.
...Though I was admittedly impressed to see gcc could implement the
boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions:
add, CMP, then conditional jump. Pretty slick stuff there, guys.)
Benchmark results:
UTF-8, short - to UTF-16LE faster by 7.36% (0.0001 vs 0.0002)
UTF-8, short - to UTF-16BE faster by 6.24% (0.0001 vs 0.0002)
UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003)
UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003)
UTF-8, long - to UTF-16BE faster by 1.02% (0.0215 vs 0.0217)
UTF-8, long - to UTF-16LE faster by 1.01% (0.0209 vs 0.0211)
MacJapanese has a somewhat unusual feature that when mapped to
Unicode, many characters map to sequences of several codepoints.
Add test cases demonstrating how mb_str_split and mb_substr behave in
this situation.
When adding these tests, I found the behavior of mb_substr was wrong
due to an inconsistency between the string "length" as measured by
mb_strlen and the number of native MacJapanese characters which
mb_substr would count when iterating over the string using the
mblen_table. This has been fixed.
I believe that mb_strstr will also return wrong results in some cases
for MacJapanese. I still need to come up with unit tests which
demonstrate the problem and figure out how to fix it.
Various mbstring legacy text encodings have what is called an 'mblen_table';
a table which gives the length of a multi-byte character using a lookup on
the first byte value. Several mbstring functions have a 'fast path' which uses
this table when it is available.
However, it turns out that iterating through a string using the mblen_table
is surprisingly slow. I found that by deleting this 'fast path' from mb_strlen,
while mb_strlen becomes a few percent slower on very small strings (0-5 bytes),
very large performance gains can be achieved on medium to long input strings.
Part of the reason for this is because our text decoding filters are so much
faster now.
Here are some benchmarks:
EUC-KR, short (0-5 chars) - master faster by 11.90% (0.0000 vs 0.0000)
EUC-JP, short (0-5 chars) - master faster by 10.88% (0.0000 vs 0.0000)
BIG-5, short (0-5 chars) - master faster by 10.66% (0.0000 vs 0.0000)
UTF-8, short (0-5 chars) - master faster by 8.91% (0.0000 vs 0.0000)
CP936, short (0-5 chars) - master faster by 6.27% (0.0000 vs 0.0000)
UHC, short (0-5 chars) - master faster by 5.38% (0.0000 vs 0.0000)
SJIS, short (0-5 chars) - master faster by 5.20% (0.0000 vs 0.0000)
UTF-8, medium (~100 chars) - new faster by 127.51% (0.0004 vs 0.0002)
UTF-8, long (~10000 chars) - new faster by 87.94% (0.0319 vs 0.0170)
UTF-8, very long (~100000 chars) - new faster by 88.25% (0.3199 vs 0.1699)
SJIS, medium (~100 chars) - new faster by 208.89% (0.0004 vs 0.0001)
SJIS, long (~10000 chars) - new faster by 253.57% (0.0319 vs 0.0090)
CP936, medium (~100 chars) - new faster by 126.08% (0.0004 vs 0.0002)
CP936, long (~10000 chars) - new faster by 200.48% (0.0319 vs 0.0106)
EUC-KR, medium (~100 chars) - new faster by 146.71% (0.0004 vs 0.0002)
EUC-KR, long (~10000 chars) - new faster by 212.05% (0.0319 vs 0.0102)
EUC-JP, medium (~100 chars) - new faster by 186.68% (0.0004 vs 0.0001)
EUC-JP, long (~10000 chars) - new faster by 295.37% (0.0320 vs 0.0081)
BIG-5, medium (~100 chars) - new faster by 173.07% (0.0004 vs 0.0001)
BIG-5, long (~10000 chars) - new faster by 269.19% (0.0319 vs 0.0086)
UHC, medium (~100 chars) - new faster by 196.99% (0.0004 vs 0.0001)
UHC, long (~10000 chars) - new faster by 256.39% (0.0323 vs 0.0091)
This does raise the question: is using the 'mblen_table' worthwhile for
other mbstring functions, such as mb_str_split? The answer is yes, it
is worthwhile; you see, while mb_strlen only needs to decode the input
string but not re-encode it, when mb_str_split is implemented using
the conversion filters, it needs to both decode the string and then
re-encode it. This means that there is more potential to gain
performance by using the 'mblen_table'. Benchmarking shows that in a
few cases, mb_str_split becomes faster when the 'mblen_table fast path'
is deleted, but in the majority of cases, it becomes slower.
ctx can never be zero in these functions because they are dispatched
virtually by looking up their entries in ctx. Furthermore, 2 of these
checks never actually worked because ctx was dereferenced before ctx was
NULL-checked.
If, for whatever reason, the random_fd has been assigned file descriptor `0` it
previously failed to close during module shutdown, thus leaking the descriptor.
As a performance optimization, mbstring implements some functions using
tables which give the (byte) length of a multi-byte character using a
lookup based on the value of the first byte. These tables are called
`mblen_table`.
For many years, the mblen_table for SJIS has had '2' in position 0x80.
That is wrong; it should have been '1'. Reasons:
For SJIS, SJIS-2004, and mobile variants of SJIS, 0x80 has never been
treated as the first byte of a 2-byte character. It has always been
treated as a single erroneous byte. On the other hand, 0x80 is a valid
character in MacJapanese... but a 1-byte character, not a 2-byte one.
The same applies to bytes 0xFD-FF; these are 1-byte characters in
MacJapanese, and in other SJIS variants, they are not valid (as the
first byte of a character).
Thanks to the GitHub user 'youkidearitai' for finding this problem.
This boosts the speed of BIG5 encoding conversion by just 1-2%.
I tried various other tweaks to the BIG5 decoding routine to see if
I could make it faster at the cost of using a larger conversion table,
but at least on the machine I am using for benchmarking, these other
changes just made things slower.
This gives a 25% speed boost for conversion operations on long strings
(~10,000 codepoints). For shorter strings, the speed boost is less
(as the input gets smaller, it is progressively swamped more and more
by the overhead of entering and exiting the conversion function).
When benchmarking string conversion speed, we are measuring not only
the speed of the decoder, but also the time which it takes to re-encode
the string in another encoding like UTF-8 or UTF-16. So the performance
increase for functions which only need to decode but not re-encode the
input string will be much more than 25%.
As with CP936, iterating over the PUA table and looking for matches in
it was a significant bottleneck for GB18030 decoding (though not as
severe a bottleneck as for CP936, since more is involved in GB18030
decoding than CP936 decoding).
Here are some benchmark results after optimizing out that bottleneck:
GB18030, medium - to UTF-16BE - faster by 60.71% (0.0007 vs 0.0017)
GB18030, medium - to UTF-8 - faster by 59.88% (0.0007 vs 0.0017)
GB18030, long - to UTF-8 - faster by 44.91% (0.0669 vs 0.1214)
GB18030, long - to UTF-16BE - faster by 43.05% (0.0672 vs 0.1181)
GB18030, short - to UTF-8 - faster by 27.22% (0.0003 vs 0.0004)
GB18030, short - to UTF-16BE - faster by 26.98% (0.0003 vs 0.0004)
(The 'short' test strings had 0-5 codepoints each, 'medium' ~100
codepoints, and 'long' ~10,000 codepoints. For each benchmark, the
test harness cycled through all the test strings 40,000 times.)