1
0
mirror of https://github.com/php/php-src.git synced 2026-04-01 21:22:13 +02:00

Unicode -> SJIS-mac conversion doesn't reject valid codepoints after a bad transcoding hint

To give the background on this issue, here is an excerpt from JAPANESE.txt,
from the Unicode Consortium:

    Apple has defined a block of 32 corporate characters as "transcoding
    hints." These are used in combination with standard Unicode characters
    to force them to be treated in a special way for mapping to other
    encodings; they have no other effect. Sixteen of these transcoding
    hints are "grouping hints" - they indicate that the next 2-4 Unicode
    characters should be treated as a single entity for transcoding. The
    other sixteen transcoding hints are "variant tags" - they are like
    combining characters, and can follow a standard Unicode (or a sequence
    consisting of a base character and other combining characters) to
    cause it to be treated in a special way for transcoding. These always
    terminate a combining-character sequence.

    The transcoding coding hints used in this mapping table are:

    0xF860  group next 2 characters as a single entity for transcoding
    0xF861  group next 3 characters as a single entity for transcoding
    0xF862  group next 4 characters as a single entity for transcoding
    0xF87A  variant tag for "negative" (i.e. black & white reversed)
    0xF87E  variant tag for vertical form
    0xF87F  variant tag for other alternate form

    For example, the Apple addition character 0x85AB is Roman numeral
    thirteen. There is no single Unicode for this (although there are
    standard Unicodes for Roman numerals 1-12). Using the grouping hint
    0xF862 in combination with standard Unicodes, we can map this as
    0xF862+0x0058+0x0049+0x0049+0x0049 (i.e. X + I + I + I).

Our SJIS-mac conversion code actually recognizes some special sequences
which start with an Apple 'transcoding hint'. However, if a transcoding
hint is misplaced and is not followed by one of the expected sequences,
we can just emit one error marker for the bad transcoding hint and then
process the following codepoint as normal.
This commit is contained in:
Alex Dowad
2020-11-09 21:40:08 +02:00
parent b27a34c5a9
commit fbdcab953d

View File

@@ -408,6 +408,7 @@ mbfl_filt_conv_wchar_sjis_mac(int c, mbfl_convert_filter *filter)
}
if (c == 0xf860 || c == 0xf861 || c == 0xf862) {
/* Apple 'transcoding hint' codepoints (from private use area) */
filter->status = 2;
filter->cache = c;
return c;
@@ -527,8 +528,9 @@ mbfl_filt_conv_wchar_sjis_mac(int c, mbfl_convert_filter *filter)
}
if (filter->status == 0) {
/* Didn't find any of expected codepoints after Apple transcoding hint */
CK(mbfl_filt_conv_illegal_output(c1, filter));
CK(mbfl_filt_conv_illegal_output(c, filter));
return mbfl_filt_conv_wchar_sjis_mac(c, filter);
}
break;