mirror of
https://github.com/php/php-src.git
synced 2026-03-31 04:32:19 +02:00
Many Japanese encodings, such as JIS7/8, Shift JIS, ISO-2022-JP, EUC-JP, and so on encode characters from the JIS X 0208 character set. JIS X 0208 is based on the concept of a 94x94 table, with numbered rows and columns. However, more than a thousand of the cells in that table are empty; JIS X 0208 does not actually use all 94x94=8,836 possible kuten codes. mbstring had a dubious feature whereby, if a Japanese string contained one of these 'unmapped' kuten codes, and it was being converted to another Japanese encoding which was also based on JIS X 0208, the non-existent character would be silently passed through, and the unmapped kuten code would be re-encoded using the normal encoding method of the target text encoding. Again, this _only_ happened if converting the text with the funky kuten code to a Japanese encoding. If one tried converting it to Unicode, mbstring would treat that as an error. If somebody, somewhere, made their own private extension to JIS X 0208, and used the regular Japanese encodings like Shift JIS and EUC-JP to encode this private character set, then this feature might conceivably be useful. But how likely is that? If someone is using Shift JIS, EUC-JP, ISO-2022-JP, etc. to encode a funky version of JIS X 0208 with extra characters added, then that should be treated as a separate text encoding. The code which flags such characters with MBFL_WCSPLANE_JIS0208 is retained solely for error reporting in `mbfl_filt_conv_illegal_output`.