1
0
mirror of https://github.com/php/php-src.git synced 2026-04-13 19:14:16 +02:00
Commit Graph

63310 Commits

Author SHA1 Message Date
Alex Dowad
ef114f94b9 Simplify code for conversion of UHC to Unicode
I was hoping to get some performance gains here, but the performance
is just the same as before, +/- a fraction of a percent.
2023-01-04 18:18:22 +02:00
Max Kellermann
efd5ecb0f2 Zend/Optimizer/zend_inference: make several pointers const
This allows removing several deconst casts from the JIT.
2023-01-04 12:59:16 +00:00
Alex Dowad
3b5072f6f6 Use smart_str in mb_http_input rather than mbfl_memory_device
For many years, the code has contained a TODO comment indicating
that the original author had wanted to do this.

Using smart_str makes the code shorter and cleaner, and it is another
step towards removing a bunch of legacy mbstring code which will soon
be unneeded.
2023-01-03 09:10:13 +02:00
Alex Dowad
0e7160b836 Implement mb_detect_encoding using fast text conversion filters
Regarding the optional 3rd `strict` argument to mb_detect_encoding,
the documentation states:

  Controls the behaviour when string is not valid in any of the listed encodings.
  If strict is set to false, the closest matching encoding will be returned;
  if strict is set to true, false will be returned.

(Ref: https://www.php.net/manual/en/function.mb-detect-encoding.php)

Because of bugs in the implementation, mb_detect_encoding did not always
behave according to this description when `strict` was false.
For example:

  <?php
  echo var_export(mb_detect_encoding("\xc0\x00", "UTF-8", false));
  // Before this commit, prints: false
  // After this commit, prints: 'UTF-8'

Because `strict` is false in the above example, mb_detect_encoding
should return the 'closest matching encoding', which is UTF-8, since
that is the only candidate encoding. (Incidentally, this example shows
that using mb_detect_encoding with a single candidate encoding in
non-strict mode is useless.)

The new implementation fixes this bug. It also fixes another problem
with the old implementation as regards non-strict detection mode:

The old implementation would stop processing of the input string using
a particular candidate encoding as soon as it saw an error in that
encoding, even in non-strict mode. This means that it could not really
detect the 'closest matching encoding'; rather, what it would return
in non-strict mode was 'the encoding in which the first decoding error
is furthest from the beginning of the input string'.

In non-strict mode, the new implementation continues trying to process
the input string to its end even after seeing an error. This makes it
possible to determine in which candidate encoding the string has the
smallest number of errors, i.e. the 'closest matching encoding'.

Rejecting candidate encodings as soon as it saw an error gave the old
implementation a marked performance advantage in non-strict mode;
however, the new implementation still beats it in most cases. Here are
a few sample microbenchmark results:

  UTF-8, ~100 codepoints, strict mode
  Old: 0.080s (100,000 calls)
  New: 0.026s ("       "    )

  UTF-8, ~100 codepoints, non-strict mode
  Old: 0.079s (100,000 calls)
  New: 0.033s ("       "    )

  UTF-8, ~10000 codepoints, strict mode
  Old: 6.708s (60,000 calls)
  New: 1.383s ("      "    )

  UTF-8, ~10000 codepoints, non-strict mode
  Old: 6.705s (60,000 calls)
  New: 3.044s ("      "    )

Notice that the old implementation had almost identical performance
between strict and non-strict mode, while the new suffers a significant
performance penalty for non-strict detection. This is the cost of
implementing the behavior specified in the documentation.

A couple more sample results:

  SJIS, ~10000 codepoints, strict mode
  Old: 4.563s
  New: 1.084s

  SJIS, ~10000 codepoints, non-strict mode
  Old: 4.569s
  New: 2.863s

This is the only case I found where the new implementation loses:

  UTF-16LE, ~10000 codepoints, non-strict mode
  Old: 1.514s
  New: 2.813s

The reason is because the test strings happened to be invalid right from
the first few bytes for all the candidate encodings except for UTF-16LE;
so the old implementation would immediately reject all those encodings
and only process the entire string in UTF-16LE.

I believe mb_detect_encoding could be made much faster if we identified
good criteria for when to reject candidate encodings before reaching
the end of the input string.
2023-01-03 09:10:10 +02:00
Alex Dowad
953864661a Implement php_mb_zend_encoding_converter using fast text conversion filters 2023-01-03 09:02:21 +02:00
Alex Dowad
88c99afdac Implement mb_str_split using fast text conversion filters
There is no great difference between the old and new code for text
encodings which either 1) use a fixed number of bytes per codepoint or
2) for which we have an 'mblen' table which enables us to find the
length of a multi-byte character using a table lookup indexed by the
first byte value.

The big difference is for other text encodings, where we have to
actually decode the string to split it. For such text encodings,
such as ISO-2022-JP and UTF-16, I measured a speedup of 50%-120% over
the previous implementation.
2023-01-03 09:02:21 +02:00
Alex Dowad
a9a672048b Implement mb_output_handler using fast text conversion filters 2023-01-03 09:02:21 +02:00
David Carlier
2a8cecdc3d Merge branch 'PHP-8.2' 2023-01-02 16:55:54 +00:00
David Carlier
acb1af802d Merge branch 'PHP-8.1' into PHP-8.2 2023-01-02 16:55:03 +00:00
Niels Dossche
d5f0362e59 Fix GH-10202: posix_getgr(gid|nam)_basic.phpt fail
The issue was that passwd was empty for the issue reporter, but the test
expected passwd to be non-empty. An empty passwd can occur if there is
no (encrypted) group password set up.
2023-01-02 16:54:47 +00:00
Max Kellermann
10d43c40dd ext/opcache/zend_shared_alloc: change "locked" check to assertion
Calling zend_shared_alloc() without holding the lock is always a bug,
not a fatal runtime error.
2023-01-02 15:49:04 +00:00
Max Kellermann
e1a25ff2ed ext/opcache/zend_shared_alloc: add assertions on "locked" flag
Let the PHP process crash if a bug causes incorrect locking calls.
2023-01-02 15:49:04 +00:00
Christoph M. Becker
0aa1fdf28d Fix variation5-win32(-mb).phpt wrt. parallel test execution
Each test should use its own temporary filenames to avoid issues when
the tests are executed in parallel[1].  We also silence the `unlink()`
calls in the CLEAN section just in case.

And while we're at it, we also remove the erroneous comment; there is
no symlinking involved for the Windows test variants.

[1] <https://github.com/php/php-src/pull/10175#issuecomment-1366809933>

Closes GH-10189.
2022-12-30 17:47:58 +01:00
George Peter Banyard
11f6022365 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-10187: Segfault in stripslashes() with arm64
  Fix memory leak in posix_ttyname()
2022-12-30 16:43:05 +00:00
George Peter Banyard
e6c9b176d4 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Fix GH-10187: Segfault in stripslashes() with arm64
  Fix memory leak in posix_ttyname()
2022-12-30 16:42:45 +00:00
Niels Dossche
4c9375e504 Fix GH-10187: Segfault in stripslashes() with arm64
Closes GH-10188

Co-authored-by: todeveni <toni.viemero@iki.fi>
Signed-off-by: George Peter Banyard <girgias@php.net>
2022-12-30 16:40:56 +00:00
George Peter Banyard
c2b0be5570 Fix memory leak in posix_ttyname()
Closes GH-10190
2022-12-30 16:24:28 +00:00
Alex Dowad
f40c3fca88 Improve mb_detect_encoding's recognition of Turkish text
Add 4 codepoints commonly used to write Turkish text to our table
of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA
2022-12-30 14:22:46 +02:00
Tim Düsterhus
3e48e52d93 Register parameter attributes via stub in ext/zend_test (#10183) 2022-12-29 23:17:02 +01:00
Alex Dowad
8b37c4ea5e Merge branch 'PHP-8.2'
* PHP-8.2:
  Allow 'h' and 'k' flags to be combined for mb_convert_kana
2022-12-29 20:39:22 +02:00
Alex Dowad
f7a19181d7 Allow 'h' and 'k' flags to be combined for mb_convert_kana
The 'h' flag makes mb_convert_kana convert zenkaku hiragana to hankaku
katakana; 'k' makes it convert zenkaku katakana to hankaku katakana.

When working on the implementation of mb_convert_kana, I added some
additional checks to catch combinations of flags which do not make
sense; but there is no conflict between 'h' and 'k' (they control
conversions for two disjoint ranges of codepoints) and this combination
should not have been restricted.

Thanks to the GitHub user 'akira345' for reporting this problem.

Closes GH-10174.
2022-12-29 20:38:01 +02:00
David Carlier
383053c4aa Merge branch 'PHP-8.2' 2022-12-29 12:22:21 +00:00
David Carlier
07bf42df41 Merge branch 'PHP-8.1' into PHP-8.2 2022-12-29 12:21:13 +00:00
Max Kellermann
e217138b40 ext/opcache/jit/zend_jit_trace: add missing lock for EXIT_INVALIDATE
Commit 6c25413183 added the flag ZEND_JIT_EXIT_INVALIDATE which
resets the trace handlers in zend_jit_trace_exit(), but forgot to
lock the shared memory section.

This could cause another worker process who still saw the
ZEND_JIT_TRACE_JITED flag to schedule ZEND_JIT_TRACE_STOP_LINK, but
when it arrived at the ZEND_JIT_DEBUG_TRACE_STOP, the handler was
already reverted by the first worker process and thus
zend_jit_find_trace() fails.

This in turn generated a bogus jump offset in the JITed code, crashing
the PHP process.
2022-12-29 12:20:56 +00:00
Dmitry Stogov
ca5f668f7c Added missed return 2022-12-29 12:40:46 +03:00
David Carlier
f7a28c4145 Merge branch 'PHP-8.2' 2022-12-26 21:19:23 +00:00
David Carlier
381d0ddc20 Merge branch 'PHP-8.1' into PHP-8.2 2022-12-26 21:18:31 +00:00
Max Kellermann
b26b758952 ext/opcache/jit: handle zend_jit_find_trace() failures
Commit 6c25413 added the flag ZEND_JIT_EXIT_INVALIDATE which resets
the trace handlers in zend_jit_trace_exit(), but forgot to consider
that on ZEND_JIT_TRACE_STOP_LINK, this changed handler gets passed to
zend_jit_find_trace(), causing it to fail, either by returning 0
(results in bogus data) or by aborting due to ZEND_UNREACHABLE().  In
either case, this crashes the PHP process.

I'm not quite sure how to fix this multi-threading problem properly;
my suggestion is to just fail the zend_jit_trace() call.  After all,
the whole ZEND_JIT_EXIT_INVALIDATE fix was about reloading modified
scripts, so there's probably no point in this pending zend_jit_trace()
call.
2022-12-26 21:17:19 +00:00
Dmitry Stogov
f922597b51 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix memory leak because of incorrect optimization
2022-12-26 13:22:02 +03:00
Dmitry Stogov
0464524292 Fix memory leak because of incorrect optimization
Fixes oss-fuzz #54488
2022-12-26 13:20:55 +03:00
George Peter Banyard
59f0fe5f16 Merge branch 'PHP-8.2' 2022-12-23 16:29:39 +00:00
Niels Dossche
a24659e70c Update test for changed behaviour of GMP constructor
Closed GH-10160

Signed-off-by: George Peter Banyard <girgias@php.net>
2022-12-23 16:29:14 +00:00
Ilija Tovilo
292f69b345 Merge branch 'PHP-8.2'
* PHP-8.2:
  Add a regression test for auto_globals_jit=0 with preloading on
2022-12-22 17:42:37 +01:00
Ilija Tovilo
db48f49888 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Add a regression test for auto_globals_jit=0 with preloading on
2022-12-22 17:42:27 +01:00
Niels Dossche
bbad29b9c1 Add a regression test for auto_globals_jit=0 with preloading on 2022-12-22 17:42:11 +01:00
David Carlier
9c2572565a sockets adding TCP_QUICKACK constant.
having tigher control on ACK delays, difference is the setting
is `volatile` as it can be turned off by the kernel if not set
 explicitally set otherwise on the socket.

Closes GH-10145.
2022-12-22 14:50:33 +00:00
Ilija Tovilo
08fb7f93a1 Merge branch 'PHP-8.2'
* PHP-8.2:
  Initialize ping_auto_globals_mask to prevent undefined behaviour
2022-12-22 15:00:14 +01:00
Ilija Tovilo
c714e626c8 Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Initialize ping_auto_globals_mask to prevent undefined behaviour
2022-12-22 15:00:00 +01:00
Niels Dossche
c4487b7a12 Initialize ping_auto_globals_mask to prevent undefined behaviour
Closes GH-10121
2022-12-22 14:59:24 +01:00
Niels
7b2c3c11b2 Cleanup redundant lookups in phar_object.c (#10150) 2022-12-22 13:00:28 +00:00
Arnaud Le Blanc
c46a0ce198 Merge branch 'PHP-8.2'
* PHP-8.2:
  [ci skip] NEWS
  [ci skip] NEWS
  ext/opcache/jit/zend_jit: fix inverted bailout value in zend_runtime_jit() (#10144)
2022-12-21 14:56:26 +01:00
Arnaud Le Blanc
f1c345394b Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  [ci skip] NEWS
  ext/opcache/jit/zend_jit: fix inverted bailout value in zend_runtime_jit() (#10144)
2022-12-21 14:55:36 +01:00
Max Kellermann
d3a6eedf4a ext/opcache/jit/zend_jit: fix inverted bailout value in zend_runtime_jit() (#10144)
In the "catch" block, do_bailout must be set to true, not false, or
else zend_bailout() never gets called.
2022-12-21 14:53:21 +01:00
Derick Rethans
0ec8733bf4 Merge branch 'PHP-8.2' 2022-12-20 16:07:02 +00:00
Derick Rethans
6b212b6dee Merge branch 'PHP-8.1' into PHP-8.2 2022-12-20 16:06:55 +00:00
Derick Rethans
d19a70c9a0 Fix GH-9891: DateTime modify with unixtimestamp (@) must work like setTimestamp 2022-12-20 14:41:13 +00:00
Christoph M. Becker
a23e837f16 Merge branch 'PHP-8.2'
* PHP-8.2:
  Force extension loading for new test
2022-12-19 16:17:02 +01:00
Christoph M. Becker
1abc1645dd Merge branch 'PHP-8.1' into PHP-8.2
* PHP-8.1:
  Force extension loading for new test
2022-12-19 16:15:24 +01:00
Christoph M. Becker
da5cbca23e Force extension loading for new test 2022-12-19 16:14:00 +01:00
Christoph M. Becker
0cbc49b3c2 Merge branch 'PHP-8.2'
* PHP-8.2:
  Skip newly added test on 32bit platforms
2022-12-19 16:08:57 +01:00