1
0
mirror of https://github.com/php/php-src.git synced 2026-03-24 08:12:21 +01:00
Files
archived-php-src/ext/mbstring
Alex Dowad 1f0cf133db Add fast mb_strcut implementation for UTF-8
The old implementation runs through the entire string to pick out the
part which should be returned by mb_strcut. This creates significant
performance overhead. The new specialized implementation of mb_strcut
for UTF-8 usually only examines a few bytes around the starting and
ending cut points, meaning it generally runs in constant time.

For UTF-8 strings just a few bytes long, the new implementation is
around 10% faster (according to microbenchmarks which I ran locally).
For strings around 10,000 bytes in length, it is 50-300x faster.
(Yes, that is 300x and not 300%.)

The new implementation behaves identically to the old one on VALID
UTF-8 strings; a fuzzer was used to help ensure this is the case.
On invalid UTF-8 strings, there is a difference: in some cases, the
old implementation will pass invalid byte sequences through unchanged,
while in others it will remove them. The new implementation has
behavior which is perhaps slightly more predictable: it simply backs
up the starting and ending cut points to the preceding "starter
byte" (one which is not a UTF-8 continuation byte).
2023-10-04 09:10:38 +02:00
..
2023-10-04 09:10:25 +02:00
2021-09-20 09:58:20 +02:00