It makes no sense to compare IPv6 address ranges as strings; there are
too many different representation possibilities. Instead, we change
`_php_filter_validate_ipv6()` so that it can calculate the IP address
as integer array. We do not rely on `inet_pton()` which may not be
available everywhere, at least IPv6 support may not, but rather parse
the IP address manually. Finally, we compare the integers.
Note that this patch does not fix what we consider as reserved and
private, respectively, but merely tries to keep what we had so far.
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>
Closes GH-7476.
While technically legal, this may cause unexpected situations
(in this example, setting an FE_FREE operand to constant null)
and is suboptimal anyway. It's better to preserve the vacuous type
and drop it later (though we currently don't implement this).
The 'fast path' in the uppercase/lowercase functions for Unicode text can be used
for a slightly greater range of characters. This is not expected to have a big
impact on performance, since the number of characters which will use the 'fast path'
is only increased by about 50-60, and these are not very commonly used characters...
but still, it doesn't cost anything.
Rather than using pointers to pointers to pointers (3 levels of indirection), what
makes sense is two levels. This reduces unnecessary pointer dereference operations.
Previously, when passed an empty string, and given an encoding which
uses a variable number of bytes per character (and which doesn't have
a 'character length table'), mb_str_split would return an array
containing a single empty string, rather than an empty array.
The ISO-2022 encodings are among those which were affected by this bug.
* PHP-8.1:
Bug #81390: mb_detect_encoding should not prematurely stop processing input
mb_detect_encoding with only one candidate encoding uses mb_check_encoding
Optimize text encoding detection for speed (eliminate Unicode property lookups)
As a performance optimization, mb_detect_encoding tries to stop
processing the input string early when there is only one 'candidate'
encoding which the input string is valid in. However, the code which
keeps count of how many candidate encodings have already been rejected
was buggy. This caused mb_detect_encoding to prematurely stop
processing the input when it should have continued.
As a result, it did not notice that in the test case provided by Alec,
the input string was not valid in UTF-16.
...By just testing the input codepoints if they are within a few fixed
ranges instead. This avoids hash lookups in property tables.
From (micro-)benchmarking on my PC, this looks to be a bit less than 4x
faster than the existing code.