archived-php-src/ext/mbstring/tests/bug69079.phpt at ecb698b21dc27de088cc2e03436d5ff5e7e48391

mirror of https://github.com/php/php-src.git synced 2026-04-18 21:41:22 +02:00

Files

Nikita Popov a8a9e93e9a Revert/fix mb_substitute_character() codepoint checks

The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.

2017-08-03 21:12:41 +02:00

1011 B

Raw Blame History

View Raw

1011 B Raw Blame History

1011 B

Raw Blame History