Except for vanilla Shift-JIS, where 0x7E is a halfwidth overline/macron.
As for Shift-JIS-2004, it has an added character (byte sequence 0x854A)
which was defined as a halfwidth macron in JIS X 0213:2000, so we use that.
Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E
represents a tilde.
For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable,
since 0x7E does represent an overline/macron in those encodings.
Follow the same principle in CP51932, which is closely related to CP932.
When Microsoft created CP932 (their version of Shift-JIS), they explicitly
used bytes 0-0x7F to represent ASCII characters rather than JIS X 0201
characters.
So when converting Unicode to CP932, it is not correct to convert U+00A5
to CP932 0x5C. Fortunately, CP932 does have a multi-byte FULLWIDTH YEN SIGN
character which we can use instead.
CP51932 uses the same extended character set as CP932; while CP932 is
MicroSoft's extended version of Shift-JIS, CP51932 is their extended version
of EUC-JP. So the same reasoning applies to CP51932.
- Don't allow control characters to appear in the middle of a multi-byte
character. (This was a strange feature of mbstring; it doesn't make much
sense, and iconv doesn't allow it.)
- Treat truncated multi-byte characters as an error.