There are two issues to resolve:
1. The FCC is not refetch when trying to unregister a trampoline
2. Comparing the function pointer of trampolines is meaningless as they are reallocated, thus we need to compare the name of the function
Found while working on GH-8294
Closes GH-10033
Dialect 1 databases store and transfer `NUMERIC(15,2)` values as
doubles, which we need to cater to in `firebird_stmt_get_col()` to
avoid `ZEND_ASSUME(0)` to ever be triggered, since that may result
in undefined behavior.
Since adding a regression test would require to create a dialect 1
database, we go without it.
Closes GH-10021.
When decoding a 3-byte UTF-8 code unit, redundant checks for overlong
code unit and for illegal codepoints from U+D800-DFFF were included.
Both of these conditions are caught by the line which reads:
if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {
As such, there is no reason to check for the same error conditions again.
Likewise, when decoding a 4-byte UTF-8 code unit, there was a
redundant check for overlong code unit. That was already caught by the
line which reads:
if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {
For JIS encoding, hiragana and katakana can be input in multiple forms.
One form uses JISX 0201 escape sequences. Another is called 'GR-invoked'
kana.
In the context of ISO-2022 encoding, bytes with a zero bit in the MSB
are called "GL" (or "graphics left") and those with the MSB set are
called "GR" (or "graphics right"). Regarding the variants of
ISO-2022-JP which are called "JIS7" and "JIS8", Wikipedia states:
"Other, older variants known as JIS7 and JIS8 build directly on the
7-bit and 8-bit encodings defined by JIS X 0201 and allow use of JIS X
0201 kana from G1 without escape sequences, using Shift Out and Shift
In or setting the eighth bit (GR-invoked), respectively."
In harmony with this, we have always accepted bytes from 0xA3-0xDF and
decoded them to the corresponding hiragana/katakana. However, at some
point I accidentally broke output for these kana. You can see the
problem in 3v4l.org by running this program:
<?php
echo bin2hex(mb_convert_encoding("\xA3", 'JIS', 'JIS'));
The results are:
Output for 8.2rc1 - rc3
1b244200231b2842
Output for 7.4.0 - 7.4.33, 8.0.1 - 8.0.25, 8.1.12
1b2849231b2842
Output for 8.1.0 - 8.1.11
1b284923
You can see that from 8.1.0 - 8.1.11, there was a missing escape
sequence at the end. That was caused because the flush functions were
not being called properly, and has already been fixed. However, this
also shows that the output for 8.2rc1-rc3 is completely invalid.
It is trying to output a JISX 0208 sequence, but with 0x00 as one of
the JISX 0208 bytes, which is illegal.
Add the missing code which will make the new text conversion filters
behave the same as the old ones when outputting hiragana/katakana in
JIS encoding.
We need to overwrite the __toString magic method for SplFileObject, similarly to how DirectoryIterator overwrites it
Moreover, the custom cast handler is useless as we define __toString methods, so use the standard one instead.
Closes GH-9912
Closure::call() makes a temporary copy of original closure function, modifies its
scope, resets ZEND_ACC_CLOSURE flag and call it through zend_call_function().
As result the same function may be called with and without
ZEND_ACC_CLOSURE flag, that confuses JIT and may lead to memory leak or
even worse memory errors.
The patch allocates "fake" closure object and keep ZEND_ACC_CLOSURE flag
to always behave in the same way.
This bug was found when I was fuzzing a patch related to mb_strpos.
In some cases, the legacy text conversion code for UTF-7 (and
UTF7-IMAP) would correctly recognize an error for a Base64-encoded
section which was not correctly padded with zero bits, but the new
(and faster) text conversion code would not.
Specifically, if the input string ended abruptly after the 4th or 7th
byte of a Base64-encoded section, the new conversion code would
confirm that the trailing padding bits from the previous byte (3rd or
6th) were zeroes, but would not check whether the 4th or 7th byte
itself encoded any non-zero bits. The legacy conversion code did
perform this check and would treat the input string as invalid.
Actually, even if the 4th or 7th byte does encode only (padding) zero
bits, this is still a problem, because there is no reason to have a
4th (or 7th) byte in that case. The UTF-7 string should have ended
on the previous byte instead.
Apply the same fix for both UTF-7 and UTF7-IMAP.