Except for vanilla Shift-JIS, where 0x7E is a halfwidth overline/macron.
As for Shift-JIS-2004, it has an added character (byte sequence 0x854A)
which was defined as a halfwidth macron in JIS X 0213:2000, so we use that.
Previously, the unit tests for these text encodings covered all mappings
from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy.
However, we should also test the few Unicode -> legacy mappings which
are not reversible.
- Reject otherwise valid kuten codes which don't map to anything in JIS X 0208.
- Handle truncated multi-byte characters as an error.
- Convert Shift-JIS 0x7E to Unicode 0x203E (overline) as recommended by the
Unicode Consortium, and as iconv does.
- Convert Shift-JIS 0x5C to Unicode 0xA5 (yen sign) as recommended by the
Unicode Consortium, and as iconv does.
(NOTE: This will affect PHP scripts which use an internal encoding of
Shift-JIS! PHP assigns a special meaning to 0x5C, the backslash. For example,
it is used for escapes in double-quoted strings. Mapping the Shift-JIS yen
sign to the Unicode yen sign means the yen sign will not be usable for
C escapes in double-quoted strings. Japanese PHP programmers who want to
write their source code in Shift-JIS for some strange reason will have to
use the JIS X 0208 backlash or 'REVERSE SOLIDUS' character for their C
escapes.)
- Convert Unicode 0x5C (backslash) to Shift-JIS 0x815F (reverse solidus).
- Immediately handle error if first Shift-JIS byte is over 0xEF, rather than
waiting to see the next byte. (Previously, the value used was 0xFC, which is
the limit for the 2nd byte and not the 1st byte of a multi-byte character.)
- Don't allow 'control characters' to appear in the middle of a multi-byte
character.
The test case for bug 47399 is now obsolete. That test assumed that a number
of Shift-JIS byte sequences which don't map to any character were 'valid'
(because the byte values were within the legal ranges).