1
0
mirror of https://github.com/php/php-src.git synced 2026-04-19 05:51:02 +02:00
Commit Graph

64063 Commits

Author SHA1 Message Date
Michael Voříšek
bd03c0343e Allow CTE on more CTE safe functions (#10771) 2023-05-16 21:59:26 +02:00
Alex Dowad
7914b8cefd Use pakutoma's encoding check functions for mb_detect_encoding even in non-strict mode
In 6fc8d014df, pakutoma added specialized validity checking functions
for some legacy text encodings like ISO-2022-JP and UTF-7. These
check functions perform a more strict validity check than the encoding
conversion functions for the same text encodings. For example, the
check function for ISO-2022-JP verifies that the string ends in the
correct state required by the specification for ISO-2022-JP.

These check functions are already being used to make detection of text
encoding more accurate when 'strict' detection mode is enabled.

However, since the default is 'non-strict' detection (a bad API design
but we're stuck with it now), most users will not benefit from
pakutoma's work. I was previously reluctant to enable this new logic
for non-strict detection mode. My intention was to reduce the scope of
behavior changes, since almost *any* behavior change may affect *some*
user in a way we don't expect.

However, we definitely have users whose (production) code was broken
by the changes I made in 28b346bc06, and enabling pakutoma's check
functions for non-strict detection mode would un-break it. (See
GH-10192 as an example.) The added checks do also make sense.

In non-strict detection mode, we will not immediately reject candidate
encodings whose validity check function returns false; but they will
be much less likely to be selected. However, failure of the validity
check function is weighted less heavily than an encoding error detected
by the encoding conversion function.
2023-05-16 07:01:07 -07:00
Alex Dowad
3ab10da758 Take order of candidate encodings into account when guessing text encoding
The documentation for mb_detect_encoding says that this function
"Detects the most likely character encoding for string `string` from an
ordered list of candidates".

Prior to 28b346bc06, mb_detect_encoding did not really attempt to
determine the "most likely" text encoding for the input string. It
would just return the first candidate encoding for which the string was
valid. In 28b346bc06, I amended this function so that it uses heuristics
to try to guess which candidate encoding is "most likely".

However, the caller did not have any way to indicate which candidate
text encoding(s) they consider to be more likely, in case the
heuristics applied are inconclusive. In the language of Bayesian
probability, there was no way for the caller to indicate their 'prior'
assignment of probabilities.

Further, the documentation for mb_detect_encoding also says that the
second parameter `encodings` is "a list of character encodings to try,
in order". The documentation clearly implies that the order of
the `encodings` argument should be significant.

Therefore, amend mb_detect_encoding so that while it still uses
heuristics to guess the most likely text encoding for the input string,
it favors those which are earlier in the list of candidate encodings.

One complication is that many callers of mb_detect_encoding use it
in this way:

    mb_detect_encoding($string, mb_list_encodings());

In a majority of cases, this is bad code; mb_detect_encoding will both
be much slower and the results will be less reliable than if a smaller
list of candidates is used. However, since such code already exists and
people are using it in production, we should not unnecessarily break it.
The order of candidate encodings obviously does not express any prior
belief of which candidates are more likely in this case, and treating
it as if it did will degrade the accuracy of the result.

Since mb_list_encodings now returns a single, immutable array on each
call, we can avoid that problem by turning off the new behavior when
we receive the array of encodings returned by mb_list_encodings.
This implementation means that if the user does this:

    $a = mb_list_encodings();
    mb_detect_encoding($string, $a);

...then the order of candidate encodings will not be considered.
However, if the user explicitly initializes their own array of all
supported legacy text encodings, then the order *will* be considered.

The other functions which also follow this new behavior are:

• mb_convert_variables
• mb_convert_encoding (when multiple candidate input encodings are
  listed)

Other places where "detection" (or really "guessing") of text encoding
may be performed include:

• mb_send_mail
• Zend engine, when determining the encoding of a PHP script
• mbstring processing of HTTP request contents, when http_input INI
  parameter is set to a list

In these cases, the new logic based on order of candidate encodings
is *not* enabled. It *might* be logical to consider the order of
candidate encodings in some or all of these cases, but I'm not sure if
that is true, so it seems wiser to avoid more behavior changes than is
necessary. Further, ever since the new encoding detection heuristics
were implemented in 28b346bc06, we have not received any complaints of
user code being broken in these areas. So I am reluctant to "fix what
isn't broken".

Well, some might say that applying the new detection heuristics
to mb_send_mail, etc. in 28b346bc06 was "fixing what wasn't broken",
but (cough cough) I don't have any comment on that...
2023-05-16 07:01:07 -07:00
Alex Dowad
97e29bed9e Use shared, immutable array for return value of mb_list_encodings
This will allow us to easily check in other mbstring functions if the
list of all supported encodings, returned by mb_list_encodings, is
passed in as input to another function.

Co-authored-by: Ilija Tovilo <ilija.tovilo@me.com>
2023-05-16 07:01:07 -07:00
George Peter Banyard
80c8ca9c8f Use uint32_t for variable storing ZEND_NUM_ARGS() 2023-05-16 11:34:41 +01:00
George Peter Banyard
e35cd34bcd Fix assertion warning message when no description is provided 2023-05-16 11:33:30 +01:00
Ilija Tovilo
6408fb21f6 Merge branch 'PHP-8.2'
* PHP-8.2:
  Added negative offset test for mb_strrpos
  Fix segfault in mb_strrpos/mb_strripos with ASCII encoding and negative offset
2023-05-15 10:41:22 +02:00
Randy Geraads
c5a623ba5e Added negative offset test for mb_strrpos
Should expose https://github.com/php/php-src/issues/11217
2023-05-15 10:36:37 +02:00
Ilija Tovilo
aa553af911 Fix segfault in mb_strrpos/mb_strripos with ASCII encoding and negative offset
We're setting the encoding from PHP_FUNCTION(mb_strpos), but mbfl_strpos would
discard it, setting it to mbfl_encoding_pass, making zend_memnrstr fail due to a
null-pointer exception.

Fixes GH-11217
Closes GH-11220
2023-05-15 10:36:37 +02:00
Ilija Tovilo
0600f513b3 Implement delayed early binding for classes without parents
Normally, we add classes without parents (and no interfaces or traits) directly
to the class map, early binding the class. However, if the same class has
already been registered, we would instead just add a ZEND_DECLARE_CLASS
instruction and let the handler throw a duplicate class declaration exception.

However, with opcache, if on the next request the files are included in the
opposite order, we won't perform early binding. To fix this, create a
ZEND_DECLARE_CLASS_DELAYED instruction instead and handle classes without
parents accordingly, skipping any linking for classes that are already linked in
delayed early binding.

Fixes GH-8846
2023-05-15 10:25:33 +02:00
Sara
6bd546462c Cacheline demote to improve performance (#11101)
Once code is emitted to JIT buffer, hint the hardware to
demote the corresponding cache lines to more distant level
so other CPUs can access them more quickly.
This gets nearly 1% performance gain on our workload.

Signed-off-by: Xue,Wang   <xue1.wang@intel.com>
Signed-off-by: Tao,Su     <tao.su@intel.com>
Signed-off-by: Hu,chen    <hu1.chen@intel.com>
2023-05-15 10:28:43 +03:00
Ilija Tovilo
ac41608797 Fix -Wenum-int-mismatch warning in ext/json/php_json_encoder.h 2023-05-14 22:10:23 +02:00
Jakub Zelenka
a225f6ab6b Merge branch 'PHP-8.2' 2023-05-13 18:54:16 +01:00
Jakub Zelenka
90553af15c Merge branch 'PHP-8.1' into PHP-8.2 2023-05-13 18:53:35 +01:00
Jakub Zelenka
e8a836eb39 Expose JSON internal function to escape string 2023-05-13 18:41:33 +01:00
nielsdos
2fa8473eca Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
2023-05-12 23:42:54 +02:00
nielsdos
d369a7764f Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
2023-05-12 23:40:54 +02:00
Niels Dossche
7b768485f3 Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
php_stream_read() may return less than the requested amount of bytes by
design. This patch introduces a static function for exif which reads
from the stream in a loop until all the requested bytes are read.

For the test: Co-authored-by: dotpointer

Closes GH-10924.
2023-05-12 23:37:00 +02:00
Ilija Tovilo
e0af7c332d Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix delayed early binding class redeclaration error
2023-05-12 19:29:27 +02:00
Ilija Tovilo
e3499130f1 Fix delayed early binding class redeclaration error
If we bind the class to the runtime slot even if we're not the ones who have
performed early binding we'll miss the redeclaration error in the
ZEND_DECLARE_CLASS_DELAYED handler.

Closes GH-11226
2023-05-12 19:29:04 +02:00
iamluc
730f32bad9 Keep the orig_path for xport stream
Closes GH-11113
2023-05-12 15:33:55 +01:00
kocsismate
09dd3e3daf Narrow some more return types to true 2023-05-10 19:08:15 +02:00
nielsdos
63a84a2445 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-8426: make test fail while soap extension build
2023-05-09 19:57:02 +02:00
nielsdos
44491d17fb Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-8426: make test fail while soap extension build
2023-05-09 19:52:52 +02:00
nielsdos
6ba0b06819 Fix GH-8426: make test fail while soap extension build
If you build soap as a shared object, then these tests fail on
non-Windows, or when the PHP install hasn't been make install-ed yet,
but is executed from the development directory.

Closes GH-11211.
2023-05-09 19:48:45 +02:00
Ilija Tovilo
38cf52d8aa Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix use-of-uninitialized value in phar_object.c
2023-05-08 17:07:04 +02:00
Ilija Tovilo
b71a961363 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix use-of-uninitialized value in phar_object.c
2023-05-08 17:06:57 +02:00
Ilija Tovilo
78ec64af44 Fix use-of-uninitialized value in phar_object.c
resource would stay uninitialized if the first call to zend_parse_parameters
fails, but the value is still passed to phar_add_file(). It's not used there if
cont_str is provided and so didn't cause any issues.

Closes GH-11202
2023-05-08 17:06:44 +02:00
Michael Voříšek
37e6594545 Fix gmp_long/gmp_ulong typedef warning on Windows x86 (#11112) 2023-05-07 23:30:12 +02:00
Máté Kocsis
85338569de Narrow bool return types to true when possible 2023-05-07 19:34:09 +02:00
Niels Dossche
f6e296dbb9 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11180: hash_file() appears to be restricted to 3 arguments
2023-05-07 17:40:29 +02:00
Niels Dossche
e6730565b6 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-11180: hash_file() appears to be restricted to 3 arguments
2023-05-07 17:37:25 +02:00
Niels Dossche
baa07f3de3 Fix GH-11180: hash_file() appears to be restricted to 3 arguments
Closes GH-11198.
2023-05-07 17:33:28 +02:00
George Peter Banyard
646f54b594 ext/standard/array.c: use uint32_t instead of incorrect int type
Drive-by indentation fixes and bool usage
2023-05-07 15:01:37 +01:00
George Peter Banyard
1820c421f1 Prevent unnecessary string duplication in assert() (#11031) 2023-05-07 15:00:30 +01:00
Niels Dossche
82b05373b1 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11160: Few tests failed building with new libxml 2.11.0
2023-05-06 23:15:57 +02:00
Niels Dossche
dc1a70c244 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-11160: Few tests failed building with new libxml 2.11.0
2023-05-06 23:10:58 +02:00
Niels Dossche
7c0dfc5cf5 Fix GH-11160: Few tests failed building with new libxml 2.11.0
It's possible to categorise the failures into 2 categories:
  - Changed error message. In this case we either duplicate the test and
    modify the error message. Or if the change in error message is
    small, we use the EXPECTF matchers to make the test compatible with both
    old and new versions of libxml2.
  - Missing warnings. This is caused by a change in libxml2 where the
    parser started using SAX APIs internally [1]. In this case the
    error_type passed to php_libxml_internal_error_handler() changed from
    PHP_LIBXML_ERROR to PHP_LIBXML_CTX_WARNING because it internally
    started to use the SAX handlers instead of the generic handlers.
    However, for the SAX handlers the current input stack is empty, so
    nothing is actually printed. I fixed this by falling back to a
    regular warning without a filename & line number reference, which
    mimicks the old behaviour. Furthermore, this change now also shows
    an additional warning in a test which was previously hidden.

[1] 9a82b94a94

Closes GH-11162.
2023-05-06 23:10:07 +02:00
Niels Dossche
80efa76b8b Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix maximum argument count of pcntl_forkx()
2023-05-06 21:00:30 +02:00
Niels Dossche
a0e71cb811 Fix maximum argument count of pcntl_forkx()
Closes GH-11199.
2023-05-06 20:56:27 +02:00
Florian Moser
4d4b9604ca Fix GH-11054: Reset OpenSSL errors when using a PEM public key
The error happens when the PEM contains a public key, as it will be
first tried to be parsed as a certificate. The parsing as a certificate
fails, which then leads to a corresponding error tracked by PHP with
the next call to php_openssl_store_errors().

This change introduces an error marking to be able to reset the stored
errors to the state before trying the certificate.

Closes GH-11055
2023-05-06 11:56:31 +01:00
Daniel Kesselberg
fa10dfcc81 Add PKCS7_NOOLDMIMETYPE and OPENSSL_CMS_OLDMIMETYPE
PKCS7_NOOLDMIMETYPE to use Content-Type application/pkcs7-mime
OPENSSL_CMS_OLDMIMETYPE to use Content-Type application/x-pkcs7-mime

SMIME_write_PKCS7 and SMIME_write_CMS are using SMIME_write_ASN1_ex.
The Content-Type application/x-pkcs7-mime is generated with the flag SMIME_OLDMIME (0x400).[^1]

SMIME_write_PKCS7 set SMIME_OLDMIME by default.[^2]
SMIME_write_CMS does not.[^3]

I picked OPENSSL_CMS_OLDMIMETYPE over OPENSSL_CMS_NOOLDMIMETYPE because that's what the flag actually does.

[^1]: 9a2f78e14a/crypto/asn1/asn_mime.c (L248-L251)
[^2]: 9a2f78e14a/crypto/pkcs7/pk7_mime.c (L41-L43)
[^3]: 9a2f78e14a/crypto/cms/cms_io.c (L93)

Signed-off-by: Daniel Kesselberg <mail@danielkesselberg.de>
2023-05-06 11:12:31 +01:00
David CARLIER
f18a0384c1 ext/pgsql: fix pg_trace test when trace mode is supported. (#11191) 2023-05-06 10:02:30 +01:00
nielsdos
42aaac3525 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10031: [Stream] STREAM_NOTIFY_PROGRESS over HTTP emitted irregularly for last chunk of data
2023-05-05 19:30:05 +02:00
nielsdos
1fc18a84d9 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10031: [Stream] STREAM_NOTIFY_PROGRESS over HTTP emitted irregularly for last chunk of data
2023-05-05 19:28:01 +02:00
Niels Dossche
b33fbbfe3d Fix GH-10031: [Stream] STREAM_NOTIFY_PROGRESS over HTTP emitted irregularly for last chunk of data
It's possible that the server already sent in more data than just the headers.
Since the stream only accepts progress increments after the headers are
processed, the already read data is never added to the process.
We account for this by adjusting the progress counter by the difference of
already read header data and the body.

For the test:
Co-authored-by: aetonsi <18366087+aetonsi@users.noreply.github.com>

Closes GH-10492.
2023-05-05 19:26:44 +02:00
David CARLIER
f31d253849 ext/pgsql adding PGSQL_ERRORS_SQLSTATE constant support.
Close GH-11181
2023-05-05 15:08:27 +01:00
Calvin Buckley
3af5f47ce6 http_response_code should warn if headers were already sent
This would previously fail silently. We also return false to indicate the error.

Fixes GH-10742
Closes GH-10744
2023-05-05 15:24:56 +02:00
David CARLIER
2e0f75ec14 ext/pgsql: pg_lo_read addressing the todo. (#11159) 2023-05-05 12:41:52 +01:00
Julien Quiaios
bb38ad7768 Add new test for array_fill() to cover the case when the parameter count is too large (#11184) 2023-05-05 12:36:17 +01:00