1
0
mirror of https://github.com/php/php-src.git synced 2026-04-18 21:41:22 +02:00
Commit Graph

3 Commits

Author SHA1 Message Date
Alex Dowad
4f3bd2e235 Convert U+203E (OVERLINE) to 0x8150 (FULLWIDTH MACRON) in some SJIS variants
Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E
represents a tilde.

For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable,
since 0x7E does represent an overline/macron in those encodings.

Follow the same principle in CP51932, which is closely related to CP932.
2020-11-25 20:51:45 +02:00
Alex Dowad
d1d50c2b7a Test EUC-JP and Shift-JIS more thoroughly
Previously, the unit tests for these text encodings covered all mappings
from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy.
However, we should also test the few Unicode -> legacy mappings which
are not reversible.
2020-11-11 11:18:58 +02:00
Alex Dowad
8f6889b20d Fix mbstring support for EUC-JP text encoding
- Don't allow control characters to appear in the middle of a multi-byte
  character. (A strange feature, or perhaps misfeature, of mbstring which is
  not present in other libraries such as iconv.)
- When checking whether string is valid, reject kuten codes which do not
  map to any character, whether converting from EUC-JP to another encoding,
  or converting another encoding which uses JIS X 0208/0212 charsets to
  EUC-JP.
- Truncated multi-byte characters are treated as an error.
2020-11-09 13:45:17 +02:00