mirror of
https://github.com/php/php-src.git
synced 2026-04-11 10:03:18 +02:00
As with CP936, iterating over the PUA table and looking for matches in
it was a significant bottleneck for GB18030 decoding (though not as
severe a bottleneck as for CP936, since more is involved in GB18030
decoding than CP936 decoding).
Here are some benchmark results after optimizing out that bottleneck:
GB18030, medium - to UTF-16BE - faster by 60.71% (0.0007 vs 0.0017)
GB18030, medium - to UTF-8 - faster by 59.88% (0.0007 vs 0.0017)
GB18030, long - to UTF-8 - faster by 44.91% (0.0669 vs 0.1214)
GB18030, long - to UTF-16BE - faster by 43.05% (0.0672 vs 0.1181)
GB18030, short - to UTF-8 - faster by 27.22% (0.0003 vs 0.0004)
GB18030, short - to UTF-16BE - faster by 26.98% (0.0003 vs 0.0004)
(The 'short' test strings had 0-5 codepoints each, 'medium' ~100
codepoints, and 'long' ~10,000 codepoints. For each benchmark, the
test harness cycled through all the test strings 40,000 times.)