This version replaces SPACEs before the meridian with NARROW NO-BREAK
SPACEs. Thus, we split the affected test cases as usual.
(cherry picked from commit 8dd51b462d)
Fixes GH-10262.
A copy of this piece of code exists in zend_jit_compile_side_trace(),
but there, the leak bug does not exist.
This bug exists since both copies of this piece of code were added in
commit 4bf2d09ede
If, for whatever reason, the random_fd has been assigned file descriptor `0` it
previously failed to close during module shutdown, thus leaking the descriptor.
As a performance optimization, mbstring implements some functions using
tables which give the (byte) length of a multi-byte character using a
lookup based on the value of the first byte. These tables are called
`mblen_table`.
For many years, the mblen_table for SJIS has had '2' in position 0x80.
That is wrong; it should have been '1'. Reasons:
For SJIS, SJIS-2004, and mobile variants of SJIS, 0x80 has never been
treated as the first byte of a 2-byte character. It has always been
treated as a single erroneous byte. On the other hand, 0x80 is a valid
character in MacJapanese... but a 1-byte character, not a 2-byte one.
The same applies to bytes 0xFD-FF; these are 1-byte characters in
MacJapanese, and in other SJIS variants, they are not valid (as the
first byte of a character).
Thanks to the GitHub user 'youkidearitai' for finding this problem.
The issue was that passwd was empty for the issue reporter, but the test
expected passwd to be non-empty. An empty passwd can occur if there is
no (encrypted) group password set up.
The 'h' flag makes mb_convert_kana convert zenkaku hiragana to hankaku
katakana; 'k' makes it convert zenkaku katakana to hankaku katakana.
When working on the implementation of mb_convert_kana, I added some
additional checks to catch combinations of flags which do not make
sense; but there is no conflict between 'h' and 'k' (they control
conversions for two disjoint ranges of codepoints) and this combination
should not have been restricted.
Thanks to the GitHub user 'akira345' for reporting this problem.
Closes GH-10174.
Commit 6c25413183 added the flag ZEND_JIT_EXIT_INVALIDATE which
resets the trace handlers in zend_jit_trace_exit(), but forgot to
lock the shared memory section.
This could cause another worker process who still saw the
ZEND_JIT_TRACE_JITED flag to schedule ZEND_JIT_TRACE_STOP_LINK, but
when it arrived at the ZEND_JIT_DEBUG_TRACE_STOP, the handler was
already reverted by the first worker process and thus
zend_jit_find_trace() fails.
This in turn generated a bogus jump offset in the JITed code, crashing
the PHP process.
Commit 6c25413 added the flag ZEND_JIT_EXIT_INVALIDATE which resets
the trace handlers in zend_jit_trace_exit(), but forgot to consider
that on ZEND_JIT_TRACE_STOP_LINK, this changed handler gets passed to
zend_jit_find_trace(), causing it to fail, either by returning 0
(results in bogus data) or by aborting due to ZEND_UNREACHABLE(). In
either case, this crashes the PHP process.
I'm not quite sure how to fix this multi-threading problem properly;
my suggestion is to just fail the zend_jit_trace() call. After all,
the whole ZEND_JIT_EXIT_INVALIDATE fix was about reloading modified
scripts, so there's probably no point in this pending zend_jit_trace()
call.
In b5ff87ca71, I made a number of adjustments to our conversion code
for CP1252. One of the adjustments was to make the mappings match those
published by the Unicode Consortium in the file CP1252.TXT. These do
not include mappings for the CP1252 bytes 0x81, 0x8D, 0x8F, 0x90, and
0x9D.
Rostyslav Gulka reported that this caused a problem. His application
stores binary JPEG data in an MS-SQL database. When they SELECT the
binary data out of the database, it is treated as CP1252 text and
automatically converted to UTF-8. To recover the original binary
data, they then do a conversion from UTF-8 to CP1252.
Obviously, that does not work if certain CP1252 bytes do not map to
any Unicode codepoint at all.
While this is a very unusual application of text encoding conversion,
and we might choose not to support it if there was no other basis for
including those mappings, it seems that Microsoft does actually include
them in the Win32 API as "best fit" mappings. These are extra mappings
from Unicode to other text encodings, which the Win32 API function
WideCharToMultiByte uses by default unless the WC_NO_BEST_FIT_CHARS
flag was passed.
A list of these "best fit" mappings for CP1252 can be found here:
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt