1
0
mirror of https://github.com/php/php-src.git synced 2026-03-30 04:02:19 +02:00
Commit Graph

122661 Commits

Author SHA1 Message Date
Alex Dowad
34ece40872 Remove useless mbstring encoding 'JIS-ms'
MicroSoft invented three encodings very similar to ISO-2022-JP/JIS7/JIS8, called
CP50220, CP50221, and CP50222. All three are supported by mbstring.

Since these encodings are very similar, some code can be shared. Actually,
conversion of CP50220/1/2 to Unicode is exactly the same operation; it's when
converting from Unicode to CP50220/1/2 that some small differences arise in how
certain katakana are handled.

The most important common code was a function called `mbfl_filt_wchar_jis_ms`.
The `jis_ms` part doubtless refers to the fact that these encodings are modified
versions of 'JIS' invented by 'MS'. mbstring also went a step further and exported
'JIS-ms' to userland as a separate encoding from CP50220/1/2. If users requested
'JIS-ms' conversion, they got something like CP50220/1/2, minus their special
ways of handling half-width katakana when converting from Unicode.

But... that 'encoding' is not something which actually exists in the world outside
of mbstring. CP50220/1/2 do exist in MicroSoft software, but not 'JIS-ms'.

For a text encoding conversion library, inventing new variant encodings and
implementing them is not very productive. Our interest is in handling text
encodings which real people actually use for... you know, storing actual text
and things like that.
2021-01-15 21:55:41 +02:00
Alex Dowad
fcbe45de10 Remove useless mbstring encoding 'CP50220-raw'
CP50220 is a variant of ISO-2022-JP invented by MicroSoft, which handles some
Unicode characters which are not representable in ISO-2022-JP by converting
them to similar characters which are representable.

What, then, is CP50220-raw? An Internet search turns up absolutely nothing.
Reference works which I consulted don't say anything about it. Other text
conversion libraries don't support it.

From looking at the code: It's just the same as CP50220, but it accepts
unmapped JIS X 0208 characters passed through from other Japanese encodings
and silently encodes them using the usual ISO-2022-JP escape sequence and
representation for JIS X 0208 characters.

It's hard to see how this could be useful. OK, let me come out and say it:
it's _not_ useful. We can confidently jettison this (mis)feature.
2021-01-15 21:55:41 +02:00
Alex Dowad
888f5d7729 CP5022{0,1,2}: treat truncated multibyte characters as error 2021-01-15 21:55:41 +02:00
Alex Dowad
2a93a8bb8c Add test suite for CP5022{0,1,2} 2021-01-15 21:55:41 +02:00
Nikita Popov
cebdad4b53 Protect against buffer overflow in xxhash unserialization
We need to make sure that memsize is < 32 bytes.

Fixes oss-fuzz #29538.
2021-01-15 17:29:33 +01:00
Nikita Popov
141c4be70a Limit unserialization element count more aggressively
This is slightly more aggressive about rejecting obviously incorrect
element counts. Previously the number of elements was allowed to
match the number of characters. Now it is the number of characters
divided by two (this can actually be increased further to at least 4).

This doesn't really matter in the grand scheme of things (as it
just cuts maximum memory usage by half), but should fix
oss-fuzz #29356.
2021-01-15 17:07:51 +01:00
Nikita Popov
21562aa98d Check for append to $GLOBALS
Fixes oss-fuzz #29389.
2021-01-15 16:58:31 +01:00
Nikita Popov
3e01f5afb1 Replace zend_bool uses with bool
We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.

Of course, zend_bool is retained as an alias.
2021-01-15 12:33:06 +01:00
Nikita Popov
e2c8ab7c33 Print "interned" instead of fake refcount in debug_zval_dump()
debug_zval_dump() currently prints refcount 1 for interned strings
and arrays, which does not really reflect the truth. These values
are not refcounted, so the refcount is misleading. Instead print
an "interned" tag.

Closes GH-6598.
2021-01-15 12:21:24 +01:00
Nikita Popov
869221cfb6 Build PDO OCI and OCI8 on azure
The extensions are build as shared to only check that they compile,
without running tests. The OCI8 extension does not properly SKIPIF
no database is available.

It should be noted that if we do want to also test these, then
(apart from running a database) it will also be necessary to configure
with LIBS="-Wl,--disable-new-dtags" in order to force the use of RPATH
instead of RUNPATH, the latter of which does not affect dlopened
libraries. Using LD_LIBRARY_PATH does not mesh well with our test
suite.

Closes GH-6604.
2021-01-15 12:12:13 +01:00
Nikita Popov
daa420a0da Fix misleading indentation warning in pdo_oci 2021-01-15 11:51:43 +01:00
Nikita Popov
16cf1b915d compare_function() returns zend_result 2021-01-15 11:51:28 +01:00
Nikita Popov
058756b3bb Remove the convert_to_long_base function
This function is unused in php-src, and has somewhat dubious
semantics, especially since we switched convert_to_long to not
use strtol for the base 10 case.

If you want to convert strings from a different base, use
ZEND_STRTOL directly.
2021-01-15 10:43:26 +01:00
Alex Dowad
0ec34da8e0 CP5022{0,1,2}: treat unrecognized escapes as error 2021-01-15 08:30:36 +02:00
Alex Dowad
a50607d11d CP5022{0,1,2}: use JISX0201 for U+203E (overline)
Same issue as d497c0e96f addressed for JIS7/JIS8, but for CP5022{0,1,2} this time.
2021-01-15 08:30:30 +02:00
Alex Dowad
5e5243ab65 CP5022{0,1,2}: convert Unicode codepoints in 'user' area (0xE000-E757) correctly
Unicode has a range of 'private' codepoints which individual applications can
use for their own purposes. When they were inventing CP932, MicroSoft mapped
these 'private' or 'user' codepoints to ten new rows added to the JIS X 0208
character table. (JIS X 0208 is based on a 94x94 table; MS used rows 95-114
for private characters.)

`mbfl_filt_conv_wchar_jis_ms` converted these private codepoints to rows 85-94
rather than 95-114. The code included a link to a document on the OpenGroup
web site, dating back to 1996 [1], which proposed mapping private codepoints to
these rows. However, that is not consistent with what mbstring does when
converting CP5022x to Unicode.

There seems to be a dearth of information on CP5022x on the web. However, I
did find one (Japanese-language) page on CP50221, which states that it maps
kuten codes 0x7F21-0x927E to the 'private' Unicode codepoints [2].

As a side note, using rows higher than 95 does seem to defeat one purpose of
using an ISO-2022-JP variant: ISO-2022-JP was specifically designed to be
"7-bit clean", but once you go beyond row 95, the ku codes are 0x80 and up,
so 8 bits are needed.

[1] https://web.archive.org/web/20000229180004/http://www.opengroup.or.jp/jvc/cde/ucs-conv.html
[2] https://www.wdic.org/w/WDIC/Microsoft%20Windows%20Codepage%20%3A%2050221
2021-01-15 08:26:46 +02:00
Alex Dowad
6e9c8386cb CP5022{0,1,2}: convert characters in ku 0x2D (13th row) correctly
Essentially, CP5022{0,1,2} are to CP932 as ISO-2022-JP is to Shift-JIS.
As Shift-JIS and ISO-2022-JP both encode characters from the JIS X 0208 charset,
CP932 and CP5022x both encode characters from JIS X 0208 _plus_ extra characters
added as MicroSoft vendor extensions.

Among the added characters are a number of symbols which MS put in the 13th row
of the 94x94 character table. (In JIS X 0208, that row is empty.)

mbfilter_cp50220x.c had an `if` clause which was intended to handle the
conversion of characters in that 13th row, but it was dead code, as the previous
clause was always true in those cases. The solution is to reverse the order of
those two clauses (just as they already appeared in mbfilter_cp932.c).
2021-01-15 08:26:38 +02:00
Alex Dowad
cdd0724291 Stricter handling of erroneous input when converting CP5022{0,1,2} text encoding
Don't allow escape sequences to start in the middle of a multibyte character.
Also, don't silently pass through illegal bytes which appear where the 2nd
byte of a multibyte character should be.
2021-01-15 08:25:44 +02:00
George Peter Banyard
eec11c29d7 [skip-ci] Add minimal build instruction for Fedora 2021-01-15 03:50:32 +00:00
Anna Filina
df30f09be5 Add test to verify file_get_contents error with folder
Closes GH-6600.
2021-01-14 23:49:26 +01:00
Alex Dowad
4299e2de42 JIS7/JIS8 encoding: treat truncated multibyte characters as error 2021-01-14 22:34:16 +02:00
Alex Dowad
b67e358e75 JIS7/JIS8 encoding: handle invalid 2nd byte for Kanji correctly
Previously, in ISO-2022-JP/JIS7/JIS8, if an escape sequence (starting with 0x1B)
appeared where the 2nd byte of a multibyte character should have been, mbstring
would forget all about the truncated multibyte character and happily accept the
escape sequence. However, such sequences are not legal and should be flagged as
errors.

Also, any other illegal bytes appearing where the 2nd byte of a multibyte
character was expected were just passed through quietly to the output. Fix that.

Also add a test suite for both ISO-2022-JP and JIS7/JIS8. (These are extremely
similar encodings; JIS7 and JIS8 are variants of ISO-2022-JP. mbstring's 'JIS'
is actually a combination of JIS7 _and_ JIS8, since the extensions which each
one adds to ISO-2022-JP are disjoint.)
2021-01-14 22:31:31 +02:00
Alex Dowad
d497c0e96f JIS7/JIS8 encoding: use JISX0201 for U+203E (overline)
In other legacy Japanese encodings like Shift-JIS, we are now using a specific
JISX 0208 character for the Unicode overline (U+203E). Previously, the single
byte 0x7E was used, but an ASCII 0x7E does not represent an overline, so this
was changed.

However, JIS7/JIS8 can represent characters in the JISX 0201 character set as
well. That character set also includes an overline character, which takes less
bytes to encode than the corresponding JISX 0208 character, so we'll use it.

This is what mbstring had been doing for a long time; but it changed as a
side effect of the recent changes to how U+203E is encoded in Shift-JIS, etc.
So change it back.
2021-01-14 22:26:24 +02:00
Alex Dowad
40384da36a JIS7/JIS8 encoding: treat unrecognized escapes as error 2021-01-14 22:26:24 +02:00
Alex Dowad
c11e12ffe0 Add comment explaining why ISO-2022-JP-2004, etc strings end with ESC ( B
These encodings have multiple modes which can be selected via escape sequences.
The default starting mode is ASCII. If a string _ends_ in a different mode, we
emit a 'redundant' escape sequence to switch back to ASCII.

If the resulting string is never concatenated with other strings, that extra
escape sequence serves no purpose. But if the resulting string is concatenated
with other strings of the same encoding, it ensures that the resulting string
will be valid.
2021-01-14 22:26:24 +02:00
Alex Dowad
4b95fdf2ca ISO-2022-JP-2004 conversion: handle invalid characters correctly 2021-01-14 22:26:24 +02:00
Alex Dowad
e14bdc041a ISO-2022-JP-2004 conversion: treat unrecognized escapes as error 2021-01-14 22:26:24 +02:00
Alex Dowad
4d65c2a992 ISO-2022-JP-2004 conversion: represent backslash and tilde as ASCII
This issue dates back to some commits I merged recently, which made encodings
like Shift-JIS-2004 use appropriate JIS X 0208 characters to represent
backslashes and tildes, rather than single-byte characters which are used in
those encodings with a different meaning (for example, in these encodings,
0x5C is used for a halfwidth Yen sign, rather than a backslash).

There was an unintended side effect: ISO-2022-JP-2004 was also made to
represent backslashes and tildes using JIS X 0208 characters. However,
ISO-2022-JP explicitly includes ASCII as one of its selectable character sets,
and ISO-2022-JP-2004 is just an extension of ISO-2022-JP. So when converting
text to ISO-2022-JP-2004, we can convert Unicode backslashes and tildes to ASCII
rather than using the corresponding JIS X 0208 characters.
2021-01-14 22:26:24 +02:00
Nikita Popov
b429228420 Remove zend_locale_sprintf_double()
This function is unused, and also not particularly useful now that
PHP no longer prints doubles in a locale-sensitive way unless
someone really goes out of their way to force it.
2021-01-14 12:13:34 +01:00
Nikita Popov
422d1665a2 Make convert_to_*_ex simple aliases of convert_to_*
Historically, the _ex variants separated the zval first, if a
conversion was necessary. This distinction no longer makes sense
since PHP 7.

The only difference that was still left is that _ex checked whether
the type is the same first, but the usage of these macros did not
actually distinguish on whether such an inlined check is valuable
or not in a given context.

Also drop the unused convert_to_explicit_type macros.
2021-01-14 12:11:11 +01:00
Nikita Popov
1b2aba285d Remove Z_PARAM separate params where they don't make sense
Separation can only possibly make sense for array parameters
(or something that can contain arrays, like zval parameters). It
never makes sense to separate a bool.

The deref parameters are also of dubious utility, but leaving them
for now.
2021-01-14 11:58:08 +01:00
Nikita Popov
ec58a6f1b0 Remove SEPARATE_ZVAL_IF_NOT_REF() macro
This macro hasn't made sense since PHP 7. The correct pattern to
use is ZVAL_DEREF + SEPARATE_ZVAL_NOREF.
2021-01-14 11:08:44 +01:00
Nikita Popov
aa51785889 Remove SEPARATE_ARG_IF_REF macro
The name doesn't correspond to what it does at all, and all the
existing usages appear to be unnecessary.

Usage of this macro can be replaced by ZVAL_DEREF + Z_TRY_ADDREF_P.
2021-01-14 10:53:56 +01:00
Nikita Popov
cc4a247a5e Merge branch 'PHP-8.0'
* PHP-8.0:
  Fixed bug #80617: Type narrowing warning in ZEND_TYPE_INIT_CODE
2021-01-14 10:09:16 +01:00
Nikita Popov
880bf62224 Fixed bug #80617: Type narrowing warning in ZEND_TYPE_INIT_CODE 2021-01-14 10:08:22 +01:00
Nikita Popov
ad5ae0634d Merge branch 'PHP-8.0'
* PHP-8.0:
  Fixed bug #80596: Fix anonymous class union typehint errors
2021-01-14 10:04:47 +01:00
Daniil Gentili
f9fbba41b6 Fixed bug #80596: Fix anonymous class union typehint errors
Cut off part after null byte when resolving the class name, to
avoid cutting off a larger part lateron.

Closes GH-6601.
2021-01-14 10:04:27 +01:00
sj-i
5a5f0adb2f Fix outdated comment about refcounting in array.c [ci skip]
Originally the reference count was incremented in here.
PHP7 removed the refcounting.
aa8ecbedcb (diff-9c1967d7282ea72ecea9d5dae0dab7349a34d48cc7a10ca38ff49a616f628e40L1954)

Closes GH-6603.
2021-01-14 09:52:40 +01:00
sj-i
37b94ac38a Fix #51758: delete an outdated comment from zend_object_handler.h [ci skip]
The same description was originally written in a commit in 2004 which fixes a bug in the pre-released simplexml.
c8c0e97982

One requested to put the description in somewhere.
https://externals.io/message/7789

Then it was added as a comment in zend_object_handler.h .
7d3215d333

At the time of the comment written, the refcount of RHS was simply incremented before calling the write handler in the process of ZEND_ASSIGN_OBJ.
c8c0e97982/Zend/zend_execute.c (L407)

The refcount of a zval may be 0 or 1 if the write handler is called from zend_API in that era.
c8c0e97982/Zend/zend_API.c (L1058-L1170)

The original fix in simplexml was removed in 2018, because scalar types don't have reference counter anymore as of PHP7.
f7f790fcc9
4a475a4976

It seems that the original intent of this prescription was preventing unintended modification to the RHS and values which share the memory location with the RHS in assignments.

In the first place, it is not usual trying to change the RHS in a write handler, IMHO. I don't think the description makes sense in the current situation about handling of refcount, so I simply delete the whole sentences.

Because write_dimension has no return value, the mentioning about the return value is moved to the comment for write_property only.

Closes GH-6597.
2021-01-14 09:50:00 +01:00
Dmitry Stogov
924ec32426 Merge branch 'PHP-8.0'
* PHP-8.0:
  Fixed bug #80422 (php_opcache.dll crashes when using Apache 2.4 with JIT)
2021-01-14 08:16:50 +03:00
Dmitry Stogov
3edf5c969a Fixed bug #80422 (php_opcache.dll crashes when using Apache 2.4 with JIT) 2021-01-14 08:16:27 +03:00
Adam Baratz
4affb585a8 Remove flakiness from tests 2021-01-13 19:39:41 -05:00
Nikita Popov
d8b22c56cf Fix INDIRECT elements leaked by SPL __serialize implementations 2021-01-12 15:35:19 +01:00
Dmitry Stogov
1a44599dee Always use CG(arena) for unin type lists 2021-01-12 16:33:38 +03:00
Christoph M. Becker
1a0fa12753 Merge branch 'PHP-8.0'
* PHP-8.0:
  socket_create_pair() can no longer return NULL
2021-01-12 12:09:13 +01:00
Christoph M. Becker
41e9a8ebdc socket_create_pair() can no longer return NULL
Closes GH-6592.
2021-01-12 12:08:31 +01:00
Nikita Popov
13e049ecfd Merge branch 'PHP-8.0'
* PHP-8.0:
  Use arc4random_buf on macOS
2021-01-12 10:43:18 +01:00
David CARLIER
7a049cd6a4 Use arc4random_buf on macOS
macOS uses an AES based arc4random_buf implementation since at least
macOS 10.2.

Closes GH-6591.
2021-01-12 10:42:09 +01:00
Nikita Popov
45a4d07dd0 Merge branch 'PHP-8.0'
* PHP-8.0:
  Add support for union types for internal functions
2021-01-12 10:15:13 +01:00
Nikita Popov
973138f39d Add support for union types for internal functions
This closes the last hole in the supported types for internal
function arginfo types. It's now possible to represent unions of
multiple classes. This is done by storing them as TypeA|TypeB and
PHP will then convert this into an appropriate union type list.

Closes GH-6581.
2021-01-12 10:14:41 +01:00