1
0
mirror of https://github.com/php/php-src.git synced 2026-04-24 00:18:23 +02:00
Commit Graph

14106 Commits

Author SHA1 Message Date
Alex Dowad 0e7160b836 Implement mb_detect_encoding using fast text conversion filters
Regarding the optional 3rd `strict` argument to mb_detect_encoding,
the documentation states:

  Controls the behaviour when string is not valid in any of the listed encodings.
  If strict is set to false, the closest matching encoding will be returned;
  if strict is set to true, false will be returned.

(Ref: https://www.php.net/manual/en/function.mb-detect-encoding.php)

Because of bugs in the implementation, mb_detect_encoding did not always
behave according to this description when `strict` was false.
For example:

  <?php
  echo var_export(mb_detect_encoding("\xc0\x00", "UTF-8", false));
  // Before this commit, prints: false
  // After this commit, prints: 'UTF-8'

Because `strict` is false in the above example, mb_detect_encoding
should return the 'closest matching encoding', which is UTF-8, since
that is the only candidate encoding. (Incidentally, this example shows
that using mb_detect_encoding with a single candidate encoding in
non-strict mode is useless.)

The new implementation fixes this bug. It also fixes another problem
with the old implementation as regards non-strict detection mode:

The old implementation would stop processing of the input string using
a particular candidate encoding as soon as it saw an error in that
encoding, even in non-strict mode. This means that it could not really
detect the 'closest matching encoding'; rather, what it would return
in non-strict mode was 'the encoding in which the first decoding error
is furthest from the beginning of the input string'.

In non-strict mode, the new implementation continues trying to process
the input string to its end even after seeing an error. This makes it
possible to determine in which candidate encoding the string has the
smallest number of errors, i.e. the 'closest matching encoding'.

Rejecting candidate encodings as soon as it saw an error gave the old
implementation a marked performance advantage in non-strict mode;
however, the new implementation still beats it in most cases. Here are
a few sample microbenchmark results:

  UTF-8, ~100 codepoints, strict mode
  Old: 0.080s (100,000 calls)
  New: 0.026s ("       "    )

  UTF-8, ~100 codepoints, non-strict mode
  Old: 0.079s (100,000 calls)
  New: 0.033s ("       "    )

  UTF-8, ~10000 codepoints, strict mode
  Old: 6.708s (60,000 calls)
  New: 1.383s ("      "    )

  UTF-8, ~10000 codepoints, non-strict mode
  Old: 6.705s (60,000 calls)
  New: 3.044s ("      "    )

Notice that the old implementation had almost identical performance
between strict and non-strict mode, while the new suffers a significant
performance penalty for non-strict detection. This is the cost of
implementing the behavior specified in the documentation.

A couple more sample results:

  SJIS, ~10000 codepoints, strict mode
  Old: 4.563s
  New: 1.084s

  SJIS, ~10000 codepoints, non-strict mode
  Old: 4.569s
  New: 2.863s

This is the only case I found where the new implementation loses:

  UTF-16LE, ~10000 codepoints, non-strict mode
  Old: 1.514s
  New: 2.813s

The reason is because the test strings happened to be invalid right from
the first few bytes for all the candidate encodings except for UTF-16LE;
so the old implementation would immediately reject all those encodings
and only process the entire string in UTF-16LE.

I believe mb_detect_encoding could be made much faster if we identified
good criteria for when to reject candidate encodings before reaching
the end of the input string.
2023-01-03 09:10:10 +02:00
Alex Dowad f40c3fca88 Improve mb_detect_encoding's recognition of Turkish text
Add 4 codepoints commonly used to write Turkish text to our table
of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA
2022-12-30 14:22:46 +02:00
David Carlier 9c2572565a sockets adding TCP_QUICKACK constant.
having tigher control on ACK delays, difference is the setting
is `volatile` as it can be turned off by the kernel if not set
 explicitally set otherwise on the socket.

Closes GH-10145.
2022-12-22 14:50:33 +00:00
Christoph M. Becker 416420b362 [ci skip] Remove duplicated NEWS entry 2022-12-16 14:45:00 +01:00
Tim Düsterhus f9a1a90380 Add Randomizer::nextFloat() and Randomizer::getFloat() (#9679)
* random: Add Randomizer::nextFloat()

* random: Check that doubles are IEEE-754 in Randomizer::nextFloat()

* random: Add Randomizer::nextFloat() tests

* random: Add Randomizer::getFloat() implementing the y-section algorithm

The algorithm is published in:

Drawing Random Floating-Point Numbers from an Interval. Frédéric
Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022.
https://doi.org/10.1145/3503512

* random: Implement getFloat_gamma() optimization

see https://github.com/php/php-src/pull/9679/files#r994668327

* random: Add Random\IntervalBoundary

* random: Split the implementation of γ-section into its own file

* random: Add tests for Randomizer::getFloat()

* random: Fix γ-section for 32-bit systems

* random: Replace check for __STDC_IEC_559__ by compile-time check for DBL_MANT_DIG

* random: Drop nextFloat_spacing.phpt

* random: Optimize Randomizer::getFloat() implementation

* random: Reject non-finite parameters in Randomizer::getFloat()

* random: Add NEWS/UPGRADING for Randomizer’s float functionality
2022-12-14 17:48:47 +01:00
Tim Düsterhus 284f61ee22 [ci skip] Fix typo in unserialize() function name in NEWS
see dd8de1e726
2022-12-14 17:43:43 +01:00
David Carlier 3fb7198034 intl extension, follow up on #10006 for numfmt_set_pattern
Closes GH-10073.
2022-12-12 19:54:13 +00:00
David Carlier 6422cf6f1a intl extension: msgfmt_set_pattern add pattern format error informations. 2022-12-09 17:10:51 +00:00
Tim Düsterhus b34cdc582f [ci skip] Fix json_validate() formatting in NEWS
It is expected that each entry ends with a `.`. I've removed the RFC link here,
as NEWS entries do not contain links, when looking at the past branches. The
RFC link is available in UPGRADING since the previous commit.
2022-12-09 17:58:04 +01:00
Joshua Rüsweg ac3ecd03af Add Randomizer::getBytesFromString() method (#9664)
* Add `Randomizer::getBytesFromAlphabet()` method

* Rename `getBytesFromAlphabet` to `getBytesFromString`

* [ci skip] Add NEWS/UPGRADING for Randomizer::getBytesFromString()

Co-authored-by: Tim Düsterhus <tim@bastelstu.be>
2022-12-09 17:39:13 +01:00
David CARLIER 3660bc31de opcache fixing w/x pages creation on freebsd 13.1 and above.
By default, the system allows these but admin can disable them system wide.
However the procctl api permits to control it per process.

Closes GH-9896.
2022-11-18 19:22:00 +00:00
Ilija Tovilo 8731fb2d09 Fix caching of default params with side-effects
Fixes GH-9965
Closes GH-9935
2022-11-17 11:52:12 +01:00
Tim Düsterhus dd8de1e726 Promote unserialize() notices to warning (#9629)
* Unserialize: Migrate "Unexpected end of serialized data" to E_WARNING

* Unserialize: Migrate "Error at offset %d of %d bytes" to E_WARNING

* Unserialize: Migrate "%s is returned from __sleep() multiple times" to E_WARNING

* Add NEWS for “Promote unserialize() notices to warning”
2022-11-15 19:36:38 +01:00
David Carlier e0e347b4a8 Fix GH-9923: Add the SIGINFO constant in pcntl for system supporting it.
Closes #9938
2022-11-12 19:37:32 +00:00
Pierrick Charron 4c372ec600 [ci skip] Order NEWS sections alphabetically 2022-11-09 00:15:51 -05:00
Chen, Hu 37b84b7e32 Fiber: add shadow stack support
Shadow stack is part of Intel's Control-Flow Enforcement Technology (CET).

Whenever a function is called, the return address is pushed onto both
the regular stack and the shadow stack. When that function returns, the
return addresses are popped off both stacks and compared; if they fail
to match, #CP raised.

With this commit, we create shadow stack for each fiber context and
switch the shadow stack accordingly during fcontext switch.

Signed-off-by: Chen, Hu <hu1.chen@intel.com>

Closes GH-9283.
2022-11-07 14:48:27 +01:00
Christoph M. Becker d672c0430f Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix potential NULL pointer dereference Windows shm*() functions
2022-11-02 14:55:34 +01:00
Christoph M. Becker 79d4fdad52 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix potential NULL pointer dereference Windows shm*() functions
2022-11-02 14:54:48 +01:00
Christoph M. Becker 8bf6266e65 Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Fix potential NULL pointer dereference Windows shm*() functions
2022-11-02 14:53:30 +01:00
Christoph M. Becker d1c9ff5642 Fix potential NULL pointer dereference Windows shm*() functions
`shm_get()` (not to be confused with `shmget()`) returns `NULL` if
reallocation fails; we need to cater to that when calling the function.

Closes GH-9872.
2022-11-02 14:51:59 +01:00
David CARLIER 4c4e72f149 socket add socket_atmark support.
checks whether the socket belongs to the out-of-band mark, thus allows to be
 processed accordingly (using the MSG_OOB flag on send/recv).

Closes #9846.
2022-10-31 16:38:18 +00:00
Dominic H e4a1b80a5f Match FPM status pool's expose_php with parent
If an installed php.ini turns expose_php on/off, and an FPM pool
overrides that with php_flag[expose_php]=off/on, a status pool
created with pm.status_listen in a pool config will have its expose_php
reflect the php.ini value, and not the pool config's override.

This change looks for an override set in
php_flag/php_value/php_admin_flag/php_admin_value and carries that
through.
2022-10-30 15:40:58 +00:00
Jakub Zelenka eb9cf18703 Merge branch 'PHP-8.1' into PHP-8.2 2022-10-30 11:46:06 +00:00
Jakub Zelenka 29f7c4613e Merge branch 'PHP-8.0' into PHP-8.1 2022-10-30 11:43:11 +00:00
Jakub Zelenka 1c5844aa3e Fix GH-9754: SaltStack hangs when running php-fpm 8.1.11
SaltStack uses Python subprocess and redirects stderr to stdout which is
then piped to the returned output. If php-fpm starts in daemonized mode,
it should close stderr. However a fix introduced in GH-8913 keeps stderr
around so it can be later restored. That causes the issue reported in
GH-9754. The solution is to keep stderr around only when php-fpm runs in
foreground as the issue is most likely visible only there. Basically
there is no need to restore stderr when php-fpm is daemonized.
2022-10-30 11:41:33 +00:00
Tim Düsterhus 7f0b228f48 Fix pre-PHP 8.2 compatibility for php_mt_rand_range() with MT_RAND_PHP (#9839)
* Fix pre-PHP 8.2 compatibility for php_mt_rand_range() with MT_RAND_PHP

As some left-over comments indicated:

> Legacy mode deliberately not inside php_mt_rand_range()
> to prevent other functions being affected

The broken scaler was only used for `php_mt_rand_common()`, not
`php_mt_rand_range()`. The former is only used for `mt_rand()`, whereas the
latter is used for `array_rand()` and others.

With the refactoring for the introduction of ext/random `php_mt_rand_common()`
and `php_mt_rand_range()` were accidentally unified, thus introducing a
behavioral change that was reported in FakerPHP/Faker#528.

This commit moves the checks for `MT_RAND_PHP` from the general-purpose
`range()` function back into `php_mt_rand_common()` and also into
`Randomizer::getInt()` for drop-in compatibility with `mt_rand()`.

* [ci skip] NEWS for `MT_RAND_PHP` compatibility
2022-10-28 16:52:43 +02:00
Kamil Tekiela 0db2e666a5 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Add NEWS entry for #9841
2022-10-28 11:25:51 +01:00
Kamil Tekiela bce12f4e57 Add NEWS entry for #9841 2022-10-28 11:23:37 +01:00
Kamil Tekiela 96049867d8 Add NEWS entry for #9841
Closes GH-9841
2022-10-27 18:29:17 +01:00
Remi Collet c84d7cc27e move CVEs in 8.1.12 changelog 2022-10-26 17:10:29 +02:00
Remi Collet db28ee8fd0 move CVEs in 8.0.25 changelog 2022-10-26 15:27:23 +02:00
Florian Sowade b9474bf385 Don’t report arginfo violations on fake closures (#9823) 2022-10-26 12:21:41 +02:00
Florian Sowade 56c121cea2 Initialize run time cache in PDO methods (#9818)
Without the memset the memory was uninitialized and the new test segfaulted when accessing the memory in _zend_observe_fcall_begin().
2022-10-26 12:21:41 +02:00
Pierrick Charron 4ccc414961 [ci skip] Update NEWS for PHP 8.2.0RC6 2022-10-25 13:46:14 -04:00
Stanislav Malyshev 43950c3ca5 Merge branch 'PHP-8.2' 2022-10-23 18:54:32 -06:00
Stanislav Malyshev 9855fdd21a Merge branch 'PHP-8.1' into PHP-8.2 2022-10-23 18:53:56 -06:00
Stanislav Malyshev 2caa79e963 Merge branch 'PHP-8.0' into PHP-8.1 2022-10-23 18:53:26 -06:00
Stanislav Malyshev 80ccaa3e36 Merge branch 'PHP-7.4' into PHP-8.0 2022-10-23 18:52:56 -06:00
Stanislav Malyshev 2669ed7d77 Update NEWS 2022-10-23 18:50:53 -06:00
Jakub Zelenka b732d80329 Fix bug GH-9779: stream_copy_to_stream fail when dest in append mode 2022-10-23 12:40:22 +01:00
David Carlier dbedb69f6a Merge branch 'PHP-8.1' into PHP-8.2 2022-10-23 00:46:46 +01:00
David Carlier fe06c5ef60 Merge branch 'PHP-8.0' into PHP-8.1 2022-10-23 00:46:25 +01:00
Adam Saponara 45e224cf51 Fix GH-9709: Guard against current_execute_data==NULL in is_handle_exception_set 2022-10-23 00:46:05 +01:00
Bob Weinand 5e9654be03 Fixed missing run_time_cache for preloaded arena allocated internal functions
This effectively affected all preloaded enums, leading them to possibly share a run_time_cache__ptr slot with unrelated functions. (Given that these were not set again.)
This bugfix is not accompanied by a test, due to how hard to trigger it was and getting a crash also depends a lot on the precise alignment of whether a cache entry accidentally overlapping has been used etc.
2022-10-22 22:07:41 +00:00
Jakub Zelenka cb3d5a772d Merge branch 'PHP-8.1' into PHP-8.2 2022-10-22 22:14:27 +01:00
Jakub Zelenka ec844ccc3f Merge branch 'PHP-8.0' into PHP-8.1 2022-10-22 22:12:05 +01:00
Jakub Zelenka fa1b6ab5db Fix GH-8430: OpenSSL compiled with old disgests does not build
Specifically no-md2, no-md4 or no-rmd160 were not supported
2022-10-22 22:11:05 +01:00
Jakub Zelenka 1ef65c1cf0 Clean up OpenSSL engine list when OpenSSL 1.0.2 used
Attempt to fix GH-8620.
2022-10-22 11:20:00 +01:00
Kévin Dunglas 9da75d0c63 fix: no-op when signal handlers are called on threads not managed by PHP (#9766) 2022-10-22 11:17:27 +02:00
Arnaud Le Blanc 6b35850139 [ci skip] NEWS 2022-10-22 10:45:09 +02:00