archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-17 13:01:02 +02:00

Author	SHA1	Message	Date
Nikita Popov	e53162a32b	Return false on invalid codepoint in mb_chr() Instead of returning the encoding of the current substitution character. This allows a robust check for the failure case. The substitution character (especially the default of "?") is also a valid output of mb_chr() for a valid input (for "?" that would be 0x3f), so it's a bad choice for an error value.	2017-08-03 22:36:42 +02:00
Nikita Popov	41e9ba6333	Always use Unicode codepoints in mb_ord() and mb_chr() Previously mb_chr() had two different encoding-dependent behaviors: * For "Unicode-encodings" it took a Unicode codepoint and returned its encoded representation. * Otherwise it returned a big-endian binary encoding of the passed integer. Now the input is always interpreted as a Unicode codepoint. If a big-endian binary encoding is what you want, you don't need mbstring to implement that.	2017-08-03 22:14:00 +02:00
Nikita Popov	fb9bf5b64b	Revert/fix substitution character fallback The introduced checks were not correct in two respects: * It was checked whether the source encoding of the string matches the internal encoding, while the actually relevant encoding is the target encoding. * Even if the correct encoding is used, the checks are still too conservative. Just because something is not a "Unicode-encoding" does not mean that it does not map any non-ASCII characters. I've reverted the added checks and instead adjusted mbfl_convert to first try to use the provided substitution character and if that fails, perform the fallback to '?' at that point. This means that any codepoint mapped in the target encoding should now be correctly supported and anything else should fall back to '?'.	2017-08-03 21:53:59 +02:00
Nikita Popov	a8a9e93e9a	Revert/fix mb_substitute_character() codepoint checks The introduced checks did not treat "non-Unicode" encodings correctly, because they treated the passed integer as encoded in the internal encoding in that case, while in actuality the substitute character is always a Unicode codepoint. Additionally checking the codepoint against the internal encoding is not correct in any case, because the substitution character must be mapped in the target encoding of the conversion, which does not necessarily coincide with the internal encoding (the internal encoding is the default source encoding, not target encoding). This reverts the checks back to simple range checks, but in a way that still resolves #69079: Characters outside the Basic Multilingual Plane are now accepted and Surrogate Codepoints are rejected. A distinction between UTF-8 and non-UTF-8 encodings is not made for surrogate checks (as in the original patch), as surrogates are always illegal on their own. Specifying a surrogate as substitution character would only make sense if you could specify a substitution string with more than one character -- however we do not support that.	2017-08-03 21:12:41 +02:00
Yasuo Ohgaki	087dcd9381	pull-request/1100 Request #65081 mb_chr() and mb_ord() Added test cases and little optimization.	2016-08-10 11:32:10 +09:00
Masaki Kagaya	2a3c08b834	fix php_mb_ord for better handling the value of MBSTRG(current_filter_illegal_substchar)	2015-03-08 02:03:45 +09:00
Masaki Kagaya	00324a379c	add test for mb_chr and mb_ord	2015-03-08 02:03:45 +09:00

7 Commits