1
0
mirror of https://github.com/php/php-src.git synced 2026-04-27 01:48:26 +02:00
Commit Graph

132146 Commits

Author SHA1 Message Date
Alex Dowad 3ab10da758 Take order of candidate encodings into account when guessing text encoding
The documentation for mb_detect_encoding says that this function
"Detects the most likely character encoding for string `string` from an
ordered list of candidates".

Prior to 28b346bc06, mb_detect_encoding did not really attempt to
determine the "most likely" text encoding for the input string. It
would just return the first candidate encoding for which the string was
valid. In 28b346bc06, I amended this function so that it uses heuristics
to try to guess which candidate encoding is "most likely".

However, the caller did not have any way to indicate which candidate
text encoding(s) they consider to be more likely, in case the
heuristics applied are inconclusive. In the language of Bayesian
probability, there was no way for the caller to indicate their 'prior'
assignment of probabilities.

Further, the documentation for mb_detect_encoding also says that the
second parameter `encodings` is "a list of character encodings to try,
in order". The documentation clearly implies that the order of
the `encodings` argument should be significant.

Therefore, amend mb_detect_encoding so that while it still uses
heuristics to guess the most likely text encoding for the input string,
it favors those which are earlier in the list of candidate encodings.

One complication is that many callers of mb_detect_encoding use it
in this way:

    mb_detect_encoding($string, mb_list_encodings());

In a majority of cases, this is bad code; mb_detect_encoding will both
be much slower and the results will be less reliable than if a smaller
list of candidates is used. However, since such code already exists and
people are using it in production, we should not unnecessarily break it.
The order of candidate encodings obviously does not express any prior
belief of which candidates are more likely in this case, and treating
it as if it did will degrade the accuracy of the result.

Since mb_list_encodings now returns a single, immutable array on each
call, we can avoid that problem by turning off the new behavior when
we receive the array of encodings returned by mb_list_encodings.
This implementation means that if the user does this:

    $a = mb_list_encodings();
    mb_detect_encoding($string, $a);

...then the order of candidate encodings will not be considered.
However, if the user explicitly initializes their own array of all
supported legacy text encodings, then the order *will* be considered.

The other functions which also follow this new behavior are:

• mb_convert_variables
• mb_convert_encoding (when multiple candidate input encodings are
  listed)

Other places where "detection" (or really "guessing") of text encoding
may be performed include:

• mb_send_mail
• Zend engine, when determining the encoding of a PHP script
• mbstring processing of HTTP request contents, when http_input INI
  parameter is set to a list

In these cases, the new logic based on order of candidate encodings
is *not* enabled. It *might* be logical to consider the order of
candidate encodings in some or all of these cases, but I'm not sure if
that is true, so it seems wiser to avoid more behavior changes than is
necessary. Further, ever since the new encoding detection heuristics
were implemented in 28b346bc06, we have not received any complaints of
user code being broken in these areas. So I am reluctant to "fix what
isn't broken".

Well, some might say that applying the new detection heuristics
to mb_send_mail, etc. in 28b346bc06 was "fixing what wasn't broken",
but (cough cough) I don't have any comment on that...
2023-05-16 07:01:07 -07:00
Alex Dowad 97e29bed9e Use shared, immutable array for return value of mb_list_encodings
This will allow us to easily check in other mbstring functions if the
list of all supported encodings, returned by mb_list_encodings, is
passed in as input to another function.

Co-authored-by: Ilija Tovilo <ilija.tovilo@me.com>
2023-05-16 07:01:07 -07:00
George Peter Banyard 80c8ca9c8f Use uint32_t for variable storing ZEND_NUM_ARGS() 2023-05-16 11:34:41 +01:00
George Peter Banyard e35cd34bcd Fix assertion warning message when no description is provided 2023-05-16 11:33:30 +01:00
Peter Kokot aae39fe5a7 Fix #9483: Fix autoconf warnings due to old libtool (#11207) 2023-05-15 16:54:40 +01:00
Ilija Tovilo 6408fb21f6 Merge branch 'PHP-8.2'
* PHP-8.2:
  Added negative offset test for mb_strrpos
  Fix segfault in mb_strrpos/mb_strripos with ASCII encoding and negative offset
2023-05-15 10:41:22 +02:00
Randy Geraads c5a623ba5e Added negative offset test for mb_strrpos
Should expose https://github.com/php/php-src/issues/11217
2023-05-15 10:36:37 +02:00
Ilija Tovilo aa553af911 Fix segfault in mb_strrpos/mb_strripos with ASCII encoding and negative offset
We're setting the encoding from PHP_FUNCTION(mb_strpos), but mbfl_strpos would
discard it, setting it to mbfl_encoding_pass, making zend_memnrstr fail due to a
null-pointer exception.

Fixes GH-11217
Closes GH-11220
2023-05-15 10:36:37 +02:00
Ilija Tovilo 0600f513b3 Implement delayed early binding for classes without parents
Normally, we add classes without parents (and no interfaces or traits) directly
to the class map, early binding the class. However, if the same class has
already been registered, we would instead just add a ZEND_DECLARE_CLASS
instruction and let the handler throw a duplicate class declaration exception.

However, with opcache, if on the next request the files are included in the
opposite order, we won't perform early binding. To fix this, create a
ZEND_DECLARE_CLASS_DELAYED instruction instead and handle classes without
parents accordingly, skipping any linking for classes that are already linked in
delayed early binding.

Fixes GH-8846
2023-05-15 10:25:33 +02:00
Sara 6bd546462c Cacheline demote to improve performance (#11101)
Once code is emitted to JIT buffer, hint the hardware to
demote the corresponding cache lines to more distant level
so other CPUs can access them more quickly.
This gets nearly 1% performance gain on our workload.

Signed-off-by: Xue,Wang   <xue1.wang@intel.com>
Signed-off-by: Tao,Su     <tao.su@intel.com>
Signed-off-by: Hu,chen    <hu1.chen@intel.com>
2023-05-15 10:28:43 +03:00
Ilija Tovilo ac41608797 Fix -Wenum-int-mismatch warning in ext/json/php_json_encoder.h 2023-05-14 22:10:23 +02:00
Jakub Zelenka 03f64b7853 Merge branch 'PHP-8.2' 2023-05-14 12:29:19 +01:00
Jakub Zelenka 1129046482 Merge branch 'PHP-8.1' into PHP-8.2 2023-05-14 12:28:52 +01:00
Jakub Zelenka 4294e8d448 FPM: Fix memory leak for invalid primary script file handle
Closes GH-11088
2023-05-14 12:27:38 +01:00
Jakub Zelenka a225f6ab6b Merge branch 'PHP-8.2' 2023-05-13 18:54:16 +01:00
Jakub Zelenka 90553af15c Merge branch 'PHP-8.1' into PHP-8.2 2023-05-13 18:53:35 +01:00
Jakub Zelenka 5e64ead64a Fix bug #64539: FPM status - query_string not properly JSON encoded
Closes GH-11050
2023-05-13 18:43:30 +01:00
Jakub Zelenka e8a836eb39 Expose JSON internal function to escape string 2023-05-13 18:41:33 +01:00
Jakub Zelenka 7034263f6e Merge branch 'PHP-8.2' 2023-05-13 14:15:17 +01:00
Jakub Zelenka bda28eb649 Merge branch 'PHP-8.1' into PHP-8.2 2023-05-13 14:14:23 +01:00
Jakub Zelenka 102953735c Fix GH-10461: Postpone FPM child freeing in event loop
This is to prevent after free accessing of the child event that might
happen when child is killed and the message is delivered at that same
time.

Also fixes GH-10889 and properly fixes GH-8517 that was not previously
fixed correctly.
2023-05-13 14:11:42 +01:00
nielsdos 2fa8473eca Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
2023-05-12 23:42:54 +02:00
nielsdos d369a7764f Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
2023-05-12 23:40:54 +02:00
Niels Dossche 7b768485f3 Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
php_stream_read() may return less than the requested amount of bytes by
design. This patch introduces a static function for exif which reads
from the stream in a loop until all the requested bytes are read.

For the test: Co-authored-by: dotpointer

Closes GH-10924.
2023-05-12 23:37:00 +02:00
Ilija Tovilo e0af7c332d Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix delayed early binding class redeclaration error
2023-05-12 19:29:27 +02:00
Ilija Tovilo e3499130f1 Fix delayed early binding class redeclaration error
If we bind the class to the runtime slot even if we're not the ones who have
performed early binding we'll miss the redeclaration error in the
ZEND_DECLARE_CLASS_DELAYED handler.

Closes GH-11226
2023-05-12 19:29:04 +02:00
iamluc 730f32bad9 Keep the orig_path for xport stream
Closes GH-11113
2023-05-12 15:33:55 +01:00
Ilija Tovilo 8d8cfe24d3 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix potential NULL pointer access in zend_fiber_object_gc
2023-05-11 14:35:42 +02:00
Ilija Tovilo 0a04c008d0 Fix potential NULL pointer access in zend_fiber_object_gc
Accidentally introduced in GH-11208.

Fixes oss-fuzz #58795
2023-05-11 14:33:49 +02:00
Ilija Tovilo c71cf0608f Merge branch 'PHP-8.2'
* PHP-8.2:
  [skip ci] Add missing --no-progress flag to ARM build
2023-05-11 12:47:42 +02:00
Ilija Tovilo 12c30a8da3 [skip ci] Add missing --no-progress flag to ARM build 2023-05-11 12:47:25 +02:00
Ilija Tovilo 6a2d04151a Merge branch 'PHP-8.2'
* PHP-8.2:
  [skip ci] Remove NEWS entry for reverted change in PHP 8.1
2023-05-11 11:54:07 +02:00
Ilija Tovilo c47c64e398 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  [skip ci] Remove NEWS entry for reverted change in PHP 8.1
2023-05-11 11:53:59 +02:00
Ilija Tovilo ad747d93c3 [skip ci] Remove NEWS entry for reverted change in PHP 8.1 2023-05-11 11:53:18 +02:00
Ilija Tovilo 5820528c9c Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix compilation for PHP 8.1
2023-05-11 00:00:47 +02:00
Ilija Tovilo 5e962f6279 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix compilation for PHP 8.1
2023-05-11 00:00:35 +02:00
Ilija Tovilo 8f66b67ccf Fix compilation for PHP 8.1
Accidentally introduced in 175ff603c3. arData was
not part of an anonymous union.
2023-05-10 23:59:53 +02:00
kocsismate 09dd3e3daf Narrow some more return types to true 2023-05-10 19:08:15 +02:00
Bob Weinand 0787247b19 Merge branch 'PHP-8.2' 2023-05-10 16:46:33 +02:00
Bob Weinand 53558ffc71 Merge branch 'PHP-8.1' into PHP-8.2 2023-05-10 16:45:48 +02:00
Bob Weinand 975d28e278 Fix GH-11222: foreach by-ref may jump over keys during a rehash
Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
2023-05-10 16:45:05 +02:00
Ilija Tovilo 7304b56f11 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix compilation error on old GCC versions
2023-05-10 11:57:19 +02:00
Ilija Tovilo 6692477406 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix compilation error on old GCC versions
2023-05-10 11:56:07 +02:00
Amedeo Baragiola 175ff603c3 Fix compilation error on old GCC versions
In older versions of GCC (<=4.5) designated initializers would not accept member
names nested inside anonymous structures. Instead, we need to use a positional
member wrapped in {}.

Fixes GH-11063
Closes GH-11212
2023-05-10 11:55:13 +02:00
nielsdos 63a84a2445 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-8426: make test fail while soap extension build
2023-05-09 19:57:02 +02:00
nielsdos 44491d17fb Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-8426: make test fail while soap extension build
2023-05-09 19:52:52 +02:00
nielsdos 6ba0b06819 Fix GH-8426: make test fail while soap extension build
If you build soap as a shared object, then these tests fail on
non-Windows, or when the PHP install hasn't been make install-ed yet,
but is executed from the development directory.

Closes GH-11211.
2023-05-09 19:48:45 +02:00
Niels Dossche acc940645e Remove unnecessary NULL assignments after ecalloc in streams (#11209)
ecalloc already zeroes the structure, so writing NULL is not necessary.
2023-05-09 19:46:45 +02:00
Ilija Tovilo 173680acd3 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix use-of-undefined in zend_fiber_object_gc of ex->call
2023-05-09 14:38:25 +02:00
Ilija Tovilo 06fe9ff0f1 Fix use-of-undefined in zend_fiber_object_gc of ex->call
ex->call is only set for user calls, we shouldn't access it here.
zend_unfinished_execution_gc_ex wouldn't actually use it for internal calls, so
it didn't cause any serious issues.

Closes GH-11208
2023-05-09 14:37:47 +02:00