1
0
mirror of https://github.com/php/php-src.git synced 2026-03-24 00:02:20 +01:00

Use fast path in more cases when doing case folding with mb_convert_case

mbstring's Unicode case conversion is table-driven, using Minimal Perfect Hash tables.
However, for small codepoint values, we bypass the hashtable lookup and just use
hard-coded conversion logic (i.e. adding or subtracting 0x20 from the appropriate
ASCII range).

For upcasing and downcasing, we had already optimized the conditional which sends
execution down this fast path, to use the fast path for as many codepoint values
as possible. However, for case folding, this had not been done.

This will give a small performance boost for case-folding Unicode text which
includes non-breaking spaces, symbols like ¥ or ™, or accented Latin
characters (used in many European languages).
This commit is contained in:
Alex Dowad
2026-01-10 09:58:51 +09:00
parent 51b1aa160d
commit 79b52042e3

View File

@@ -180,7 +180,9 @@ static unsigned php_unicode_totitle_raw(unsigned code, const mbfl_encoding *enc)
static unsigned php_unicode_tofold_raw(unsigned code, const mbfl_encoding *enc)
{
if (code < 0x80) {
/* After the ASCII characters, the first codepoint with an special case-folded version
* is 0xB5 (MICRO SIGN) */
if (code < 0xB5) {
/* Fast path for ASCII */
if (code >= 0x41 && code <= 0x5A) {
if (UNEXPECTED(enc == &mbfl_encoding_8859_9 && code == 0x49)) {