archived-php-src

php/archived-php-src

Fork 0

mirror of https://github.com/php/php-src.git synced 2026-04-18 21:41:22 +02:00

Commit Graph

Author	SHA1	Message	Date
Alex Dowad	4f3bd2e235	Convert U+203E (OVERLINE) to 0x8150 (FULLWIDTH MACRON) in some SJIS variants Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E represents a tilde. For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable, since 0x7E does represent an overline/macron in those encodings. Follow the same principle in CP51932, which is closely related to CP932.	2020-11-25 20:51:45 +02:00
Alex Dowad	d1d50c2b7a	Test EUC-JP and Shift-JIS more thoroughly Previously, the unit tests for these text encodings covered all mappings from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy. However, we should also test the few Unicode -> legacy mappings which are not reversible.	2020-11-11 11:18:58 +02:00
Alex Dowad	8f6889b20d	Fix mbstring support for EUC-JP text encoding - Don't allow control characters to appear in the middle of a multi-byte character. (A strange feature, or perhaps misfeature, of mbstring which is not present in other libraries such as iconv.) - When checking whether string is valid, reject kuten codes which do not map to any character, whether converting from EUC-JP to another encoding, or converting another encoding which uses JIS X 0208/0212 charsets to EUC-JP. - Truncated multi-byte characters are treated as an error.	2020-11-09 13:45:17 +02:00

Author

SHA1

Message

Date

Alex Dowad

4f3bd2e235

Convert U+203E (OVERLINE) to 0x8150 (FULLWIDTH MACRON) in some SJIS variants

Converting U+203E to 0x7E was especially wrong for CP932, where 0x7E
represents a tilde.

For vanilla Shift-JIS and Shift-JIS-2004, converting to 0x7E is acceptable,
since 0x7E does represent an overline/macron in those encodings.

Follow the same principle in CP51932, which is closely related to CP932.

2020-11-25 20:51:45 +02:00

Alex Dowad

d1d50c2b7a

Test EUC-JP and Shift-JIS more thoroughly

Previously, the unit tests for these text encodings covered all mappings
from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy.
However, we should also test the few Unicode -> legacy mappings which
are not reversible.

2020-11-11 11:18:58 +02:00

Alex Dowad

8f6889b20d

Fix mbstring support for EUC-JP text encoding

- Don't allow control characters to appear in the middle of a multi-byte
  character. (A strange feature, or perhaps misfeature, of mbstring which is
  not present in other libraries such as iconv.)
- When checking whether string is valid, reject kuten codes which do not
  map to any character, whether converting from EUC-JP to another encoding,
  or converting another encoding which uses JIS X 0208/0212 charsets to
  EUC-JP.
- Truncated multi-byte characters are treated as an error.

2020-11-09 13:45:17 +02:00

3 Commits