Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E
represents a tilde.
For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable,
since 0x7E does represent an overline/macron in those encodings.
Follow the same principle in CP51932, which is closely related to CP932.
Previously, the unit tests for these text encodings covered all mappings
from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy.
However, we should also test the few Unicode -> legacy mappings which
are not reversible.
- Don't allow control characters to appear in the middle of a multi-byte
character. (A strange feature, or perhaps misfeature, of mbstring which is
not present in other libraries such as iconv.)
- When checking whether string is valid, reject kuten codes which do not
map to any character, whether converting from EUC-JP to another encoding,
or converting another encoding which uses JIS X 0208/0212 charsets to
EUC-JP.
- Truncated multi-byte characters are treated as an error.