1
0
mirror of https://github.com/php/php-src.git synced 2026-04-26 01:18:19 +02:00
Commit Graph

4 Commits

Author SHA1 Message Date
Alex Dowad c9fea7db72 Convert U+00AF (MACRON) to 0x8150 (FULLWIDTH MACRON) in some SJIS variants
Except for vanilla Shift-JIS, where 0x7E is a halfwidth overline/macron.
As for Shift-JIS-2004, it has an added character (byte sequence 0x854A)
which was defined as a halfwidth macron in JIS X 0213:2000, so we use that.
2020-11-25 20:51:45 +02:00
Alex Dowad 4f3bd2e235 Convert U+203E (OVERLINE) to 0x8150 (FULLWIDTH MACRON) in some SJIS variants
Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E
represents a tilde.

For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable,
since 0x7E does represent an overline/macron in those encodings.

Follow the same principle in CP51932, which is closely related to CP932.
2020-11-25 20:51:45 +02:00
Alex Dowad e4ee979111 0x5C is not a Yen sign in CP932 (or CP51932)
When Microsoft created CP932 (their version of Shift-JIS), they explicitly
used bytes 0-0x7F to represent ASCII characters rather than JIS X 0201
characters.

So when converting Unicode to CP932, it is not correct to convert U+00A5
to CP932 0x5C. Fortunately, CP932 does have a multi-byte FULLWIDTH YEN SIGN
character which we can use instead.

CP51932 uses the same extended character set as CP932; while CP932 is
MicroSoft's extended version of Shift-JIS, CP51932 is their extended version
of EUC-JP. So the same reasoning applies to CP51932.
2020-11-25 20:51:45 +02:00
Alex Dowad 2759874a42 Enhance handling of CP932 text encoding
- Don't allow control characters to appear in the middle of a multi-byte
  character. (This was a strange feature of mbstring; it doesn't make much
  sense, and iconv doesn't allow it.)
- Treat truncated multi-byte characters as an error.
2020-11-25 19:52:19 +02:00