A connection string may contain just a single key, but
PHP used ";" as the heuristic to detect if a string was a connection
string versus plain DSN. However, a single-key connection string
would get treated like a DSN name, i.e. "DSN=*LOCAL". This makes it
so that "=" is used, as a connection string must contain a key.
Closes GH-8748.
Even for single-character strings, this is about 50% faster for
ASCII, UTF-8, and UTF-16. For long strings, the performance gain is
enormous, since the old code would convert the ENTIRE string, just
to pick out the first codepoint.
Benchmarking reveals that this is about 8% slower for UTF-8 strings
which have a bad codepoint at the very beginning of the string.
For good strings, or those where the first bad codepoint is much
later in the string, it is significantly faster (2-3 times faster
in many cases).
In all text conversion filters which require the wchar buffer used for output
to have some minimum size, it's better to include an assertion; this will
help us to catch bugs, and will also help future readers to understand what
we expect of the function arguments.
For UTF-7 and UTF7-IMAP, these assertions were already there, but I have
added comments explaining why the minimum size is what it is.
I didn't think this through carefully enough when first writing this code,
but it's not necessary to reserve space for the 1-2 wchars which may be emitted
before exiting the function.
Why? Well, we are guaranteed that when we enter the function, there are at
least 3 spaces in the wchar buffer. The only way those can be consumed is if
wchars are emitted in the main 'while' loop, but if it does emit any wchars,
it will set 'bits' to zero at the same time, which means the final part will
not emit anything. 'bits' can be incremented again by the main loop, but the
main loop only runs while there are still at least 3 spaces in the buffer.
So basically, we are guaranteed that when the main loop terminates, either
there are 3 or more spaces remaining in the wchar buffer, or else 'bits' is
zero, or both.
Implements initial stage of accepted RFC to remove them:
https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode
Tests relating to SOAP and htmlspecialchars seem to have been
using this entirely unnecessarily, so have been fixed.
Closes GH-8726.
ZSTR_VAL can never be NULL as zend_string.val is a char[1] which will
always decay to a non-nullable pointer.
This fails with -Werror on newer gcc versions.
In d62f535caa, the legacy mbstring conversion filters for Shift-JIS
was updated to restore backwards-compatible mappings for 0x5C/0x7E.
Make the same change to the newer fast conversion filters.
According to the relevant Japan Industrial Standards Committee standards,
SJIS 0x5C is a Yen sign, and 0x7E is an overline.
However, this conflicts with the implementation of SJIS in various legacy
software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken
as equivalent to the same ASCII bytes.
Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes
compatibly with Microsoft products. This was changed in PHP 8.1.0, in an
attempt to comply with the JISC specifications. However, after discussion
with various concerned Japanese developers, it seems that the historical
behavior was more useful in the majority of applications which process
SJIS-encoded text.
Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as
equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE)
to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore
the mappings for those codepoints from before PHP 8.1.0.
Fixes GH-8281.
According to the relevant Japan Industrial Standards Committee standards,
SJIS 0x5C is a Yen sign, and 0x7E is an overline.
However, this conflicts with the implementation of SJIS in various legacy
software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken
as equivalent to the same ASCII bytes.
Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes
compatibly with Microsoft products. This was changed in PHP 8.1.0, in an
attempt to comply with the JISC specifications. However, after discussion
with various concerned Japanese developers, it seems that the historical
behavior was more useful in the majority of applications which process
SJIS-encoded text.
Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as
equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE)
to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore
the mappings for those codepoints from before PHP 8.1.0.
There is not much point in having two distinct file mappings; since the
info mapping is very small and of fixed size, we can put it at the
beginning of a single mapping. Besides the obvious resource savings,
that also simplifies the error handling.
Closes GH-8648.
We also fix gh8626.phpt for the firebird driver by allowing negative
status codes as well. We mark the other two test cases as xfail for
the firebird driver for now.
Closes GH-8666.