archived-php-src/ext/mbstring at 8f8b5ba082f8d522eceedba10012ac083c0a8376 - archived-php-src - Gitea: Git with a cup of tea

php/archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-03-24 08:12:21 +01:00

Files

History

Alex Dowad 1f0cf133db Add fast mb_strcut implementation for UTF-8

The old implementation runs through the entire string to pick out the
part which should be returned by mb_strcut. This creates significant
performance overhead. The new specialized implementation of mb_strcut
for UTF-8 usually only examines a few bytes around the starting and
ending cut points, meaning it generally runs in constant time.

For UTF-8 strings just a few bytes long, the new implementation is
around 10% faster (according to microbenchmarks which I ran locally).
For strings around 10,000 bytes in length, it is 50-300x faster.
(Yes, that is 300x and not 300%.)

The new implementation behaves identically to the old one on VALID
UTF-8 strings; a fuzzer was used to help ensure this is the case.
On invalid UTF-8 strings, there is a difference: in some cases, the
old implementation will pass invalid byte sequences through unchanged,
while in others it will remove them. The new implementation has
behavior which is perhaps slightly more predictable: it simply backs
up the starting and ending cut points to the preceding "starter
byte" (one which is not a UTF-8 continuation byte).

2023-10-04 09:10:38 +02:00

..

Add fast mb_strcut implementation for UTF-8

2023-10-04 09:10:38 +02:00

Add test cases for mb_strcut

2023-10-04 09:10:25 +02:00

Optimize mb_str{,im}width for performance

2021-09-29 18:19:01 +02:00

common_codepoints.txt

Improve mb_detect_encoding accuracy for text containing vowels with macrons

2023-08-25 12:09:55 +02:00

config.m4

Combine CJK encoding conversion code in a single source file

2023-05-20 21:27:48 -07:00

config.w32

Combine CJK encoding conversion code in a single source file

2023-05-20 21:27:48 -07:00

CREDITS

…

gen_rare_cp_bitvec.php

Mark globals as const (#10303 )

2023-01-23 13:46:58 +00:00

mb_gpc.c

Take order of candidate encodings into account when guessing text encoding

2023-05-16 07:01:07 -07:00

mb_gpc.h

Remove unused 'to_language' and 'from_language' struct fields

2022-08-16 16:43:26 +02:00

mbstring_arginfo.h

[RFC] Implement mb_str_pad() (#11284 )

2023-06-20 21:22:04 +02:00

mbstring.c

Add fast mb_strcut implementation for UTF-8

2023-10-04 09:10:38 +02:00

mbstring.h

Take order of candidate encodings into account when guessing text encoding

2023-05-16 07:01:07 -07:00

mbstring.stub.php

[RFC] Implement mb_str_pad() (#11284 )

2023-06-20 21:22:04 +02:00

php_mbregex.c

Reduce memory allocated by var_export, json_encode, serialize, and other (#8902 )

2022-07-08 14:47:46 +02:00

php_mbregex.h

Declare ext/mbstring constants in stubs (#8798 )

2022-06-23 17:34:08 +02:00

php_onig_compat.h

…

php_unicode.c

Implement conditional casing for Greek letter sigma when title-casing text

2023-01-12 17:41:11 +02:00

php_unicode.h

Speed boost for mb_stripos (when not using UTF-8)

2022-12-18 15:31:20 +02:00

rare_cp_bitvec.h

Improve mb_detect_encoding accuracy for text containing vowels with macrons

2023-08-25 12:09:55 +02:00

unicode_data.h

Update Unicode tables to 14.0.0

2021-09-20 09:58:20 +02:00