mirror of
https://github.com/php/php-src.git
synced 2026-03-27 01:32:22 +01:00
The introduced checks did not treat "non-Unicode" encodings correctly, because they treated the passed integer as encoded in the internal encoding in that case, while in actuality the substitute character is always a Unicode codepoint. Additionally checking the codepoint against the internal encoding is not correct in any case, because the substitution character must be mapped in the *target* encoding of the conversion, which does not necessarily coincide with the internal encoding (the internal encoding is the default *source* encoding, not *target* encoding). This reverts the checks back to simple range checks, but in a way that still resolves #69079: Characters outside the Basic Multilingual Plane are now accepted and Surrogate Codepoints are rejected. A distinction between UTF-8 and non-UTF-8 encodings is not made for surrogate checks (as in the original patch), as surrogates are always illegal on their own. Specifying a surrogate as substitution character would only make sense if you could specify a substitution string with more than one character -- however we do not support that.
36 lines
1011 B
PHP
36 lines
1011 B
PHP
--TEST--
|
|
Bug #69079 (enhancement for mb_substitute_character)
|
|
--SKIPIF--
|
|
<?php extension_loaded('mbstring') or die('skip mbstring not available'); ?>
|
|
--FILE--
|
|
<?php
|
|
|
|
mb_internal_encoding('UTF-8');
|
|
var_dump(mb_substitute_character(0x1F600));
|
|
var_dump(bin2hex(mb_scrub("\xff")));
|
|
mb_substitute_character(0x3f); // Reset to '?', as the next call will fail
|
|
var_dump(mb_substitute_character(0xD800)); // Surrogate (illegal)
|
|
var_dump(bin2hex(mb_scrub("\xff")));
|
|
|
|
mb_internal_encoding('EUC-JP-2004');
|
|
|
|
mb_substitute_character(0x63); // Reset to '?', as the next call will fail
|
|
mb_substitute_character(0x8fa1ef); // EUC-JP-2004 encoding of U+50AA (illegal)
|
|
var_dump(bin2hex(mb_scrub("\x8d")));
|
|
|
|
mb_substitute_character(0x50aa);
|
|
var_dump(bin2hex(mb_scrub("\x8d")));
|
|
|
|
?>
|
|
--EXPECTF--
|
|
bool(true)
|
|
string(8) "f09f9880"
|
|
|
|
Warning: mb_substitute_character(): Unknown character in %s on line %d
|
|
bool(false)
|
|
string(2) "3f"
|
|
|
|
Warning: mb_substitute_character(): Unknown character in %s on line %d
|
|
string(2) "63"
|
|
string(6) "8fa1ef"
|