archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-26 01:18:19 +02:00

Author	SHA1	Message	Date
Alex Dowad	34ece40872	Remove useless mbstring encoding 'JIS-ms' MicroSoft invented three encodings very similar to ISO-2022-JP/JIS7/JIS8, called CP50220, CP50221, and CP50222. All three are supported by mbstring. Since these encodings are very similar, some code can be shared. Actually, conversion of CP50220/1/2 to Unicode is exactly the same operation; it's when converting from Unicode to CP50220/1/2 that some small differences arise in how certain katakana are handled. The most important common code was a function called `mbfl_filt_wchar_jis_ms`. The `jis_ms` part doubtless refers to the fact that these encodings are modified versions of 'JIS' invented by 'MS'. mbstring also went a step further and exported 'JIS-ms' to userland as a separate encoding from CP50220/1/2. If users requested 'JIS-ms' conversion, they got something like CP50220/1/2, minus their special ways of handling half-width katakana when converting from Unicode. But... that 'encoding' is not something which actually exists in the world outside of mbstring. CP50220/1/2 do exist in MicroSoft software, but not 'JIS-ms'. For a text encoding conversion library, inventing new variant encodings and implementing them is not very productive. Our interest is in handling text encodings which real people actually use for... you know, storing actual text and things like that.	2021-01-15 21:55:41 +02:00
Alex Dowad	fcbe45de10	Remove useless mbstring encoding 'CP50220-raw' CP50220 is a variant of ISO-2022-JP invented by MicroSoft, which handles some Unicode characters which are not representable in ISO-2022-JP by converting them to similar characters which are representable. What, then, is CP50220-raw? An Internet search turns up absolutely nothing. Reference works which I consulted don't say anything about it. Other text conversion libraries don't support it. From looking at the code: It's just the same as CP50220, but it accepts unmapped JIS X 0208 characters passed through from other Japanese encodings and silently encodes them using the usual ISO-2022-JP escape sequence and representation for JIS X 0208 characters. It's hard to see how this could be useful. OK, let me come out and say it: it's _not_ useful. We can confidently jettison this (mis)feature.	2021-01-15 21:55:41 +02:00
Alex Dowad	888f5d7729	CP5022{0,1,2}: treat truncated multibyte characters as error	2021-01-15 21:55:41 +02:00
Alex Dowad	2a93a8bb8c	Add test suite for CP5022{0,1,2}	2021-01-15 21:55:41 +02:00
Nikita Popov	cebdad4b53	Protect against buffer overflow in xxhash unserialization We need to make sure that memsize is < 32 bytes. Fixes oss-fuzz #29538.	2021-01-15 17:29:33 +01:00
Nikita Popov	141c4be70a	Limit unserialization element count more aggressively This is slightly more aggressive about rejecting obviously incorrect element counts. Previously the number of elements was allowed to match the number of characters. Now it is the number of characters divided by two (this can actually be increased further to at least 4). This doesn't really matter in the grand scheme of things (as it just cuts maximum memory usage by half), but should fix oss-fuzz #29356.	2021-01-15 17:07:51 +01:00
Nikita Popov	3e01f5afb1	Replace zend_bool uses with bool We're starting to see a mix between uses of zend_bool and bool. Replace all usages with the standard bool type everywhere. Of course, zend_bool is retained as an alias.	2021-01-15 12:33:06 +01:00
Nikita Popov	e2c8ab7c33	Print "interned" instead of fake refcount in debug_zval_dump() debug_zval_dump() currently prints refcount 1 for interned strings and arrays, which does not really reflect the truth. These values are not refcounted, so the refcount is misleading. Instead print an "interned" tag. Closes GH-6598.	2021-01-15 12:21:24 +01:00
Nikita Popov	daa420a0da	Fix misleading indentation warning in pdo_oci	2021-01-15 11:51:43 +01:00
Alex Dowad	0ec34da8e0	CP5022{0,1,2}: treat unrecognized escapes as error	2021-01-15 08:30:36 +02:00
Alex Dowad	a50607d11d	CP5022{0,1,2}: use JISX0201 for U+203E (overline) Same issue as `d497c0e96f` addressed for JIS7/JIS8, but for CP5022{0,1,2} this time.	2021-01-15 08:30:30 +02:00
Alex Dowad	5e5243ab65	CP5022{0,1,2}: convert Unicode codepoints in 'user' area (0xE000-E757) correctly Unicode has a range of 'private' codepoints which individual applications can use for their own purposes. When they were inventing CP932, MicroSoft mapped these 'private' or 'user' codepoints to ten new rows added to the JIS X 0208 character table. (JIS X 0208 is based on a 94x94 table; MS used rows 95-114 for private characters.) `mbfl_filt_conv_wchar_jis_ms` converted these private codepoints to rows 85-94 rather than 95-114. The code included a link to a document on the OpenGroup web site, dating back to 1996 [1], which proposed mapping private codepoints to these rows. However, that is not consistent with what mbstring does when converting CP5022x to Unicode. There seems to be a dearth of information on CP5022x on the web. However, I did find one (Japanese-language) page on CP50221, which states that it maps kuten codes 0x7F21-0x927E to the 'private' Unicode codepoints [2]. As a side note, using rows higher than 95 does seem to defeat one purpose of using an ISO-2022-JP variant: ISO-2022-JP was specifically designed to be "7-bit clean", but once you go beyond row 95, the ku codes are 0x80 and up, so 8 bits are needed. [1] https://web.archive.org/web/20000229180004/http://www.opengroup.or.jp/jvc/cde/ucs-conv.html [2] https://www.wdic.org/w/WDIC/Microsoft%20Windows%20Codepage%20%3A%2050221	2021-01-15 08:26:46 +02:00
Alex Dowad	6e9c8386cb	CP5022{0,1,2}: convert characters in ku 0x2D (13th row) correctly Essentially, CP5022{0,1,2} are to CP932 as ISO-2022-JP is to Shift-JIS. As Shift-JIS and ISO-2022-JP both encode characters from the JIS X 0208 charset, CP932 and CP5022x both encode characters from JIS X 0208 _plus_ extra characters added as MicroSoft vendor extensions. Among the added characters are a number of symbols which MS put in the 13th row of the 94x94 character table. (In JIS X 0208, that row is empty.) mbfilter_cp50220x.c had an `if` clause which was intended to handle the conversion of characters in that 13th row, but it was dead code, as the previous clause was always true in those cases. The solution is to reverse the order of those two clauses (just as they already appeared in mbfilter_cp932.c).	2021-01-15 08:26:38 +02:00
Alex Dowad	cdd0724291	Stricter handling of erroneous input when converting CP5022{0,1,2} text encoding Don't allow escape sequences to start in the middle of a multibyte character. Also, don't silently pass through illegal bytes which appear where the 2nd byte of a multibyte character should be.	2021-01-15 08:25:44 +02:00
Anna Filina	df30f09be5	Add test to verify file_get_contents error with folder Closes GH-6600.	2021-01-14 23:49:26 +01:00
Alex Dowad	4299e2de42	JIS7/JIS8 encoding: treat truncated multibyte characters as error	2021-01-14 22:34:16 +02:00
Alex Dowad	b67e358e75	JIS7/JIS8 encoding: handle invalid 2nd byte for Kanji correctly Previously, in ISO-2022-JP/JIS7/JIS8, if an escape sequence (starting with 0x1B) appeared where the 2nd byte of a multibyte character should have been, mbstring would forget all about the truncated multibyte character and happily accept the escape sequence. However, such sequences are not legal and should be flagged as errors. Also, any other illegal bytes appearing where the 2nd byte of a multibyte character was expected were just passed through quietly to the output. Fix that. Also add a test suite for both ISO-2022-JP and JIS7/JIS8. (These are extremely similar encodings; JIS7 and JIS8 are variants of ISO-2022-JP. mbstring's 'JIS' is actually a combination of JIS7 _and_ JIS8, since the extensions which each one adds to ISO-2022-JP are disjoint.)	2021-01-14 22:31:31 +02:00
Alex Dowad	d497c0e96f	JIS7/JIS8 encoding: use JISX0201 for U+203E (overline) In other legacy Japanese encodings like Shift-JIS, we are now using a specific JISX 0208 character for the Unicode overline (U+203E). Previously, the single byte 0x7E was used, but an ASCII 0x7E does not represent an overline, so this was changed. However, JIS7/JIS8 can represent characters in the JISX 0201 character set as well. That character set also includes an overline character, which takes less bytes to encode than the corresponding JISX 0208 character, so we'll use it. This is what mbstring had been doing for a long time; but it changed as a side effect of the recent changes to how U+203E is encoded in Shift-JIS, etc. So change it back.	2021-01-14 22:26:24 +02:00
Alex Dowad	40384da36a	JIS7/JIS8 encoding: treat unrecognized escapes as error	2021-01-14 22:26:24 +02:00
Alex Dowad	c11e12ffe0	Add comment explaining why ISO-2022-JP-2004, etc strings end with ESC ( B These encodings have multiple modes which can be selected via escape sequences. The default starting mode is ASCII. If a string _ends_ in a different mode, we emit a 'redundant' escape sequence to switch back to ASCII. If the resulting string is never concatenated with other strings, that extra escape sequence serves no purpose. But if the resulting string is concatenated with other strings of the same encoding, it ensures that the resulting string will be valid.	2021-01-14 22:26:24 +02:00
Alex Dowad	4b95fdf2ca	ISO-2022-JP-2004 conversion: handle invalid characters correctly	2021-01-14 22:26:24 +02:00
Alex Dowad	e14bdc041a	ISO-2022-JP-2004 conversion: treat unrecognized escapes as error	2021-01-14 22:26:24 +02:00
Alex Dowad	4d65c2a992	ISO-2022-JP-2004 conversion: represent backslash and tilde as ASCII This issue dates back to some commits I merged recently, which made encodings like Shift-JIS-2004 use appropriate JIS X 0208 characters to represent backslashes and tildes, rather than single-byte characters which are used in those encodings with a different meaning (for example, in these encodings, 0x5C is used for a halfwidth Yen sign, rather than a backslash). There was an unintended side effect: ISO-2022-JP-2004 was also made to represent backslashes and tildes using JIS X 0208 characters. However, ISO-2022-JP explicitly includes ASCII as one of its selectable character sets, and ISO-2022-JP-2004 is just an extension of ISO-2022-JP. So when converting text to ISO-2022-JP-2004, we can convert Unicode backslashes and tildes to ASCII rather than using the corresponding JIS X 0208 characters.	2021-01-14 22:26:24 +02:00
Nikita Popov	422d1665a2	Make convert_to__ex simple aliases of convert_to_ Historically, the _ex variants separated the zval first, if a conversion was necessary. This distinction no longer makes sense since PHP 7. The only difference that was still left is that _ex checked whether the type is the same first, but the usage of these macros did not actually distinguish on whether such an inlined check is valuable or not in a given context. Also drop the unused convert_to_explicit_type macros.	2021-01-14 12:11:11 +01:00
Nikita Popov	1b2aba285d	Remove Z_PARAM separate params where they don't make sense Separation can only possibly make sense for array parameters (or something that can contain arrays, like zval parameters). It never makes sense to separate a bool. The deref parameters are also of dubious utility, but leaving them for now.	2021-01-14 11:58:08 +01:00
Nikita Popov	ec58a6f1b0	Remove SEPARATE_ZVAL_IF_NOT_REF() macro This macro hasn't made sense since PHP 7. The correct pattern to use is ZVAL_DEREF + SEPARATE_ZVAL_NOREF.	2021-01-14 11:08:44 +01:00
Nikita Popov	aa51785889	Remove SEPARATE_ARG_IF_REF macro The name doesn't correspond to what it does at all, and all the existing usages appear to be unnecessary. Usage of this macro can be replaced by ZVAL_DEREF + Z_TRY_ADDREF_P.	2021-01-14 10:53:56 +01:00
sj-i	5a5f0adb2f	Fix outdated comment about refcounting in array.c [ci skip] Originally the reference count was incremented in here. PHP7 removed the refcounting. https://github.com/php/php-src/commit/aa8ecbedcb94e9e22e8fd7ffd539377e747153f7#diff-9c1967d7282ea72ecea9d5dae0dab7349a34d48cc7a10ca38ff49a616f628e40L1954 Closes GH-6603.	2021-01-14 09:52:40 +01:00
Dmitry Stogov	924ec32426	Merge branch 'PHP-8.0' * PHP-8.0: Fixed bug #80422 (php_opcache.dll crashes when using Apache 2.4 with JIT)	2021-01-14 08:16:50 +03:00
Dmitry Stogov	3edf5c969a	Fixed bug #80422 (php_opcache.dll crashes when using Apache 2.4 with JIT)	2021-01-14 08:16:27 +03:00
Adam Baratz	4affb585a8	Remove flakiness from tests	2021-01-13 19:39:41 -05:00
Nikita Popov	d8b22c56cf	Fix INDIRECT elements leaked by SPL __serialize implementations	2021-01-12 15:35:19 +01:00
Dmitry Stogov	1a44599dee	Always use CG(arena) for unin type lists	2021-01-12 16:33:38 +03:00
Christoph M. Becker	1a0fa12753	Merge branch 'PHP-8.0' * PHP-8.0: socket_create_pair() can no longer return NULL	2021-01-12 12:09:13 +01:00
Christoph M. Becker	41e9a8ebdc	socket_create_pair() can no longer return NULL Closes GH-6592.	2021-01-12 12:08:31 +01:00
Nikita Popov	13e049ecfd	Merge branch 'PHP-8.0' * PHP-8.0: Use arc4random_buf on macOS	2021-01-12 10:43:18 +01:00
David CARLIER	7a049cd6a4	Use arc4random_buf on macOS macOS uses an AES based arc4random_buf implementation since at least macOS 10.2. Closes GH-6591.	2021-01-12 10:42:09 +01:00
Nikita Popov	45a4d07dd0	Merge branch 'PHP-8.0' * PHP-8.0: Add support for union types for internal functions	2021-01-12 10:15:13 +01:00
Nikita Popov	973138f39d	Add support for union types for internal functions This closes the last hole in the supported types for internal function arginfo types. It's now possible to represent unions of multiple classes. This is done by storing them as TypeA\|TypeB and PHP will then convert this into an appropriate union type list. Closes GH-6581.	2021-01-12 10:14:41 +01:00
Nikita Popov	a69da19c6b	Merge branch 'PHP-8.0' * PHP-8.0: Use ephemeral port in socket_create_listen_used.phpt	2021-01-12 10:09:38 +01:00
Nikita Popov	bc0f78a2da	Use ephemeral port in socket_create_listen_used.phpt Avoid parallelism issues.	2021-01-12 10:09:30 +01:00
Nikita Popov	1f6a3e603b	Remove unused bind_hash member The last usage has been removed in `aa16aee51c`, so drop the member as well.	2021-01-12 09:52:58 +01:00
Nikita Popov	700c2189e1	Merge branch 'PHP-8.0' * PHP-8.0: Fixed bug #80545	2021-01-12 09:50:54 +01:00
Jens de Nies	94a151a018	Fixed bug #80545 This converts the remaining "non well-formed" warnings in bcmath to ValueErrors, in line with the other warning promotions that have been performed in this extension. Closes GH-80545.	2021-01-12 09:50:27 +01:00
Dmitry Stogov	aa16aee51c	Cleanup: - ZCG(bind_hash) is not used anymore - zend_accel_function_hash_copy() and zend_accel_function_hash_copy_from_shm() are the same - zend_accel_class_hash_copy() and zend_accel_class_hash_copy_from_shm() are almost the same	2021-01-12 08:54:09 +03:00
Adam Baratz	b569698095	Add MSSQL setup to Azure Pipelines build	2021-01-11 21:46:41 -05:00
Dmitry Stogov	c6b2b3b1b5	Merge branch 'PHP-8.0' * PHP-8.0: Add guard if lvalue of assignment may be a reference, but wasn't a reference during recording	2021-01-11 15:13:32 +03:00
Dmitry Stogov	35e0506a2e	Add guard if lvalue of assignment may be a reference, but wasn't a reference during recording	2021-01-11 15:12:27 +03:00
Dmitry Stogov	d9e441be42	Better CPU registers usage	2021-01-11 14:40:30 +03:00
Dmitry Stogov	59401aa2ef	Remove redundand IS_INDIRECT checks (they were necessary for $GLOBALS handling)	2021-01-11 14:39:21 +03:00

1 2 3 4 5 ...

57993 Commits