1
0
mirror of https://github.com/php/php-src.git synced 2026-04-19 05:51:02 +02:00
Commit Graph

130876 Commits

Author SHA1 Message Date
Alex Dowad
290efe842d Adjust code which checks if encoding is ISO-8859-9 when converting case
Instead of checking the 'encoding number' to see if we are converting
case for ISO-8859-9 text, compare pointers instead.

This should free up 1 register in php_unicode_convert_case.
2023-01-12 17:41:11 +02:00
Alex Dowad
39b46a5398 Implement Unicode conditional casing rules for Greek letter sigma
The capital Greek letter sigma (Σ) should be lowercased as σ except
when it appears at the end of a word; in that case, it should be
lowercased as the special form ς.

This rule is included in the Unicode data file SpecialCasing.txt.
The condition for applying the rule is called "Final_Sigma" and is
defined in Unicode technical report 21. The rule is:

• For the special casing form to apply, the capital letter sigma must
  be preceded by 0 or more "case-ignorable" characters, preceded by
  at least 1 "cased" character.
• Further, capital sigma must NOT be followed by 0 or more
  case-ignorable characters and then at least 1 cased character.

"Case-ignorable" characters include certain punctuation marks, like
the apostrophe, as well as various accent marks. There are actually
close to 500 different case-ignorable characters, including accent marks
from Cyrillic, Hebrew, Armenian, Arabic, Syriac, Bengali, Gujarati,
Telugu, Tibetan, and many other alphabets. This category also includes
zero-width spaces, codepoints which indicate RTL/LTR text direction,
certain musical symbols, etc.

Since the rule involves scanning over "0 or more" of such
case-ignorable characters, it may be necessary to scan arbitrarily far
to the left and right of capital sigma to determine whether the special
lowercase form should be used or not. However, since we are trying to
be both memory-efficient and CPU-efficient, this implementation limits
how far to the left we will scan. Generally, we scan up to 63 characters
to the left looking for a "cased" character, but not more.

When scanning to the right, we go up to the end of the string if
necessary, even if it means scanning over thousands of characters.

Anyways, it is almost impossible to imagine that natural text will
include "words" with more than 63 successive apostrophes (for example)
followed by a capital sigma.

Closes GH-8096.
2023-01-12 17:41:11 +02:00
Max Kellermann
24b311bdd7 ext/opcache/zend_shared_alloc: rename _register_xlat_entry() params
The name "new" happens to be a C++ keyword, which was the my reason to
rethink those names.

The "xlat_table" is not only used to translate pointers for persisting
scripts to shared memory, but is also used to annoate pointers
(e.g. by the JIT to associate an op_array with its jit_extension).

The names "old" and "new" aren't good for that; often, there's nothing
"old" or "new" about them.  It's actually a generic lookup table, and
"old" shall be named "key" (which it is called internally already),
and "new" is renamed to simply "value".
2023-01-12 15:14:05 +00:00
Max Kellermann
b47bfd698d ext/opcache: C++ compatibility
Just in case somebody includes those headers from C++ code.  The same
already exists in other opcache headers.
2023-01-12 15:14:05 +00:00
Max Kellermann
45a128c9de Zend/zend_types: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
1eb71c3f15 Zend/zend_map_ptr: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
492523a779 Zend/zend_inference: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
c7a4633891 Zend/Optimizer/zend_call_graph: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
308adb915c Zend/Optimizer/sccp: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
cd27d5e07f Zend/Optimizer/dce: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
c5933409b4 Zend/Optimizer/scdf: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
46371f4eb3 Zend/zend_bitset: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
623e2e9fc6 ext/opcache/zend_accelerator_hash: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
e7434c1247 Zend/zend_long: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
d28d323ca2 Zend/Optimizer/zend_ssa: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
1a067b84ee Zend/Optimizer/zend_optimizer: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
a55c0c5fc3 Zend/Optimizer/zend_cfg: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
b5aeb3a4d4 Zend/zend_stream: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
f061a035e4 Zend/zend_float: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
b088575119 Zend/zend_extensions: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
b1d48774a7 Zend/zend_multiply: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
94f9a20ce6 Zend/zend_arena: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
4831e48708 Zend/zend_system_id: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
cd985de190 ext/standard/md5: include cleanup 2023-01-12 15:12:45 +00:00
Max Kellermann
9521d21681 main/php_globals.h: add missing include for PHPAPI 2023-01-12 15:12:45 +00:00
Max Kellermann
d6136151e9 Zend/zend_build.h: include php_config.h
Without this, the macros ZTS, ZEND_DEBUG and PHP_COMPILER_ID may be
unavailable.
2023-01-12 15:12:45 +00:00
Jakub Zelenka
da4775f071 Merge branch 'PHP-8.2' 2023-01-12 13:55:47 +00:00
Jakub Zelenka
1b48a5c802 Fix ASAN reported leak in FPM config test
This happens because config test does not shutdown SAPI.

In addition this commit also fixes few failures when running FPM tests
under root.

Closes GH-10296
2023-01-12 13:52:33 +00:00
Alex Dowad
4427b2e1ab Mark UTF-8 strings emitted by mbstring functions as valid UTF-8
We now have a couple of mbstring functions which have fast paths for
strings marked as 'valid UTF-8'. Later, we may likely have more. So
that these fast paths can be used more frequently, mark UTF-8 strings
emitted by mbstring as 'valid UTF-8'. This is always a correct thing
to do, because mbstring never returns invalid UTF-8 as the result of
a conversion (or similar) operation.

Internally, we do have a conversion mode which deliberately emits
invalid UTF-8 in some cases. (This is done to prevent unwanted matches
when we are converting strings to UTF-8 before performing matching
operations on them.) For such strings, don't set the 'valid UTF-8' flag.
It probably wouldn't hurt anything to set it, because strings generated
using that special conversion mode should *never* be returned to
userland, and I don't think we do anything with them which cares about
the IS_STR_VALID_UTF8 flag... but still, it would likely cause
confusion for developers.
2023-01-11 17:08:27 +02:00
Tim Düsterhus
e7c0f4e816 random: Rely on free(NULL) being safe for random status freeing (#10246)
* random: Rely on `free(NULL)` being safe for random status freeing

* random: Restructure `php_random_status_free()` to not early-return
2023-01-10 18:46:57 +01:00
George Peter Banyard
d7f624258d Merge branch 'PHP-8.2'
* PHP-8.2:
  fix: indirect_return compilation warning
2023-01-10 15:23:44 +00:00
George Peter Banyard
c936c02119 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  fix: indirect_return compilation warning
2023-01-10 15:23:35 +00:00
Kévin Dunglas
55514a1119 fix: indirect_return compilation warning
Closes GH-10274

Signed-off-by: George Peter Banyard <girgias@php.net>
2023-01-10 15:23:15 +00:00
Derick Rethans
cc4e958932 Merge branch 'PHP-8.2' 2023-01-10 15:16:42 +00:00
Derick Rethans
f340854a30 Merge branch 'PHP-8.1' into PHP-8.2 2023-01-10 15:16:32 +00:00
Derick Rethans
d12ba111e0 Fixed GH-10218: DateTimeZone fails to parse time zones that contain the "+" character 2023-01-10 15:15:49 +00:00
David Carlier
61cf7d49ab posix_pathconf throwing ValueError on empty path 2023-01-10 15:03:11 +00:00
Max Kellermann
ecc880f491 Zend/zend_execute: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
588a07f737 Zend/zend_multibyte: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
f377e15751 Zend/zend_ptr_stack: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
b4ba16fe18 Zend/zend_object_handlers: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
694ec1deea Zend/zend_{operators,variables}: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
6b34de8eba sapi/*: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann
aa1cd02a43 Zend/zend_fibers: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
308fd311ea ext/{standard,json,random,...}: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann
16203b53e1 main: add missing includes 2023-01-10 14:19:03 +00:00
Max Kellermann
738fb5ca54 Zend/zend_smart_str: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
9fdbefacd3 main/s[np]printf: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
cd4a7c1d90 Zend/zend_ini: include cleanup 2023-01-10 14:19:03 +00:00
Max Kellermann
928685eba2 Zend/zend_signal: include cleanup 2023-01-10 14:19:03 +00:00