1
0
mirror of https://github.com/php/php-src.git synced 2026-03-27 01:32:22 +01:00
Files
archived-php-src/ext/mbstring/tests/bug69079.phpt
Nikita Popov a8a9e93e9a Revert/fix mb_substitute_character() codepoint checks
The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.
2017-08-03 21:12:41 +02:00

36 lines
1011 B
PHP

--TEST--
Bug #69079 (enhancement for mb_substitute_character)
--SKIPIF--
<?php extension_loaded('mbstring') or die('skip mbstring not available'); ?>
--FILE--
<?php
mb_internal_encoding('UTF-8');
var_dump(mb_substitute_character(0x1F600));
var_dump(bin2hex(mb_scrub("\xff")));
mb_substitute_character(0x3f); // Reset to '?', as the next call will fail
var_dump(mb_substitute_character(0xD800)); // Surrogate (illegal)
var_dump(bin2hex(mb_scrub("\xff")));
mb_internal_encoding('EUC-JP-2004');
mb_substitute_character(0x63); // Reset to '?', as the next call will fail
mb_substitute_character(0x8fa1ef); // EUC-JP-2004 encoding of U+50AA (illegal)
var_dump(bin2hex(mb_scrub("\x8d")));
mb_substitute_character(0x50aa);
var_dump(bin2hex(mb_scrub("\x8d")));
?>
--EXPECTF--
bool(true)
string(8) "f09f9880"
Warning: mb_substitute_character(): Unknown character in %s on line %d
bool(false)
string(2) "3f"
Warning: mb_substitute_character(): Unknown character in %s on line %d
string(2) "63"
string(6) "8fa1ef"