archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-28 18:53:33 +02:00

Author	SHA1	Message	Date
Alex Dowad	671dcee01e	Add test for mb_str_split on UCS-2 text	2021-08-31 13:41:34 +02:00
Alex Dowad	f303fc8a9b	Use bool in mbfl_filt_conv_output_hex (rather than int)	2021-08-31 13:41:34 +02:00
Alex Dowad	776296e12f	mbstring no longer provides 'long' substitutions for erroneous input bytes Previously, mbstring had a special mode whereby it would convert erroneous input byte sequences to output like "BAD+XXXX", where "XXXX" would be the erroneous bytes expressed in hexadecimal. This mode could be enabled by calling `mb_substitute_character("long")`. However, accurately reproducing input byte sequences from the cached state of a conversion filter is often tricky, and this significantly complicates the implementation. Further, the means used for passing the erroneous bytes through to where the "BAD+XXXX" text is generated only allows for up to 3 bytes to be passed, meaning that some erroneous byte sequences are truncated anyways. More to the point, a search of publically available PHP code indicates that nobody is really using this feature anyways. Incidentally, this feature also provided error output like "JIS+XXXX" if the input 'should have' represented a JISX 0208 codepoint, but it decodes to a codepoint which does not exist in the JISX 0208 charset. Similarly, specific error output was provided for non-existent JISX 0212 codepoints, and likewise for JISX 0213, CP932, and a few other charsets. All of that is now consigned to the flames. However, "long" error markers also include a somewhat more useful "U+XXXX" marker for Unicode codepoints which were successfully decoded from the input text, but cannot be represented in the output encoding. Those are still supported. With this change, there is no need to use a variety of special values in the high bits of a wchar to represent different types of error values. We can (and will) just use a single error value. This will be equal to -1. One complicating factor: Text conversion functions return an integer to indicate whether the conversion operation should be immediately aborted, and the magic 'abort' marker is -1. Also, almost all of these functions would return the received byte/codepoint to indicate success. That doesn't work with the new error value; if an input filter detects an error and passes -1 to the output filter, and the output filter returns it back, that would be taken to mean 'abort'. Therefore, amend all these functions to return 0 for success.	2021-08-31 13:41:34 +02:00
Go Kudo	eaac77f4e7	Fix nested namespaced typed property in gen_stub.php (#7418 ) Property escape namespaced class name in property types.	2021-08-31 11:56:39 +02:00
Nikita Popov	5b2ddf5a17	Export zend_use_resource_as_offset() Use a common implementation to generate this error message, as we do so in quite a few places dealing with array keys.	2021-08-31 10:58:01 +02:00
Máté Kocsis	70f516d3e8	Make default value more explicit	2021-08-31 10:19:05 +02:00
Máté Kocsis	5256798d88	Merge branch 'PHP-8.0' * PHP-8.0: Fix default value of $flags in oci_fetch_all()	2021-08-31 10:14:19 +02:00
Máté Kocsis	26aa54e098	Fix default value of $flags in oci_fetch_all() (#7429 )	2021-08-31 10:05:24 +02:00
Dmitry Stogov	dad5cfa868	Rename ZREG_FCARG1x/ZREG_FCARG1a into ZREG_FCARG1	2021-08-30 20:38:52 +03:00
Christoph M. Becker	24fe7f08b5	Merge branch 'PHP-8.0' * PHP-8.0: Fix #81400: Unterminated string in dns_get_record() results	2021-08-30 18:55:16 +02:00
Christoph M. Becker	fcbe737218	Merge branch 'PHP-7.4' into PHP-8.0 * PHP-7.4: Fix #81400: Unterminated string in dns_get_record() results	2021-08-30 18:52:40 +02:00
Christoph M. Becker	edab9ad205	Fix #81400 : Unterminated string in dns_get_record() results If we assemble a zend_string manually, we need to end it with a NUL byte ourselves. We also fix the size calculation for that zend_string; there is no need for the extra byte for each part, and we don't have to multiply by two, since we're using DnsQuery_A(), not DnsQuery_W () (in which case we would have to do the character set conversion, anyway). This avoids over-allocation, and the need to explicitly set the string length. Finally, we use the proper access macro for zend_strings. Closes GH-7427.	2021-08-30 18:49:39 +02:00
Dmitry Stogov	f1f4403dc2	Fixed register allocation when ADD/SUB/MUL two references in tracing JIT The bug was introdueced by `7690fa0bd8` and leaded to failure in `make test TESTS="-d opcache.jit=1254 --repeat 3 ext/date/tests/bug30096.phpt"`	2021-08-30 19:41:39 +03:00
Denis Ryabov	d3a6054d44	Fix/improve handling of escaping in ini parser Quoting from UPGRADING: - A leading dollar in a quoted string can now be escaped: "\${" will now be interpreted as a string with contents `${`. - Backslashes in double quoted strings are now more consistently treated as escape characters. Previously, "foo\\" followed by something other than a newline was not considered as a teminated string. It is now interpreted as a string with contents `foo\`. However, as an exception, the string "foo\" followed by a newline will continue to be treated as a valid string with contents `foo\` rather than an unterminated string. This exception exists to support naive uses of Windows file pahts as "C:\foo\". Closes GH-7420.	2021-08-30 16:59:22 +02:00
Alex Dowad	15ba73cee3	Add more tests for UTF-8 text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	51a32ccaf4	Add another test for UTF-16LE	2021-08-30 16:29:58 +02:00
Alex Dowad	7472c82c45	Add tests for UCS-4 text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	79015b23aa	Add tests for UCS-2 text encoding	2021-08-30 16:29:58 +02:00
Alex Dowad	34ef8f3ca2	Add tests for '7bit' and '8bit' text encodings in mbstring	2021-08-30 16:29:58 +02:00
Alex Dowad	97f8495e0f	UCS-4 conversion does not pass BOM through to output This is to match the way that we handle UCS-2. When a BOM is found at the beginning of a 'UCS-2' string (NOT 'UCS-2BE' or 'UCS-2LE'), we take note of the intended byte order and handle the string accordingly, but do NOT emit a BOM to the output. Rather, we just use the default byte order for the requested output encoding. Some might argue that if the input string used a BOM, and we are emitting output in a text encoding where both big-endian and little-endian byte orders are possible, we should include a BOM in the output string. To such hypothetical debaters of minutiae, I can only offer you a shoulder shrug. No reasonable program which handles UCS-2 and UCS-4 text should require a BOM. Really, the concept of the BOM is a poor idea and should not have been included in Unicode. Standardizing on a single byte order would have been much better, similar to 'network byte order' for the Internet Protocol. But this is not the place to speak at length of such things.	2021-08-30 16:29:58 +02:00
Alex Dowad	e6f1a72235	Add test suite for mobile variants of UTF-8 (and fix bugs)	2021-08-30 16:29:58 +02:00
Alex Dowad	1865576694	Add test suite for EUC-JP-WIN (or EUC-JP-MS) text encoding (and fix bugs)	2021-08-30 16:29:58 +02:00
Alex Dowad	6a693d2d33	Remove useless variable: mbfl_encoding_utf8_kddi_a_aliases	2021-08-30 16:29:58 +02:00
Alex Dowad	d4561894ea	Extraneous trailing UCS-4 bytes are treated as error	2021-08-30 16:29:58 +02:00
Alex Dowad	0de4d6872e	Add more tests for SJIS-2004 text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	c7d47cbb4c	Add more tests for SJIS text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	299690a1cf	Add more tests for ISO-2022-JP/JIS7/JIS8 text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	b2be85d11a	Add more tests for ISO-2022-JP-MS text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	ae4c956089	Add more tests for ISO-2022-JP-KDDI text conversion	2021-08-30 16:29:58 +02:00
Alex Dowad	51e0d323e4	ISO-2022-JP-MS treats truncated multi-byte chars as error Sigh. I included tests which were intended to check this case in the test suite for ISO-2022-JP-MS, but those tests were faulty and didn't actually test what they were supposed to. Fixing the tests revealed that there were still bugs in this area.	2021-08-30 16:29:58 +02:00
Alex Dowad	57a81af041	ISO-2022-JP-KDDI text conversion doesn't swallow PUA codepoints There was a bit of legacy code here which looks like the original author of mbstring intended to allow conversion of Unicode Private Use Area codepoints to ISO-2022-JP-KDDI. However, that code never worked. It set the output variable to values which were not matched by any of the 'if' clauses below, which meant that nothing was actually emitted to the output. In other words, if one tried to convert Unicode to ISO-2022-JP-KDDI, and the Unicode string contained PUA codepoints, they would be quietly 'swallowed' and disappear. I don't know what ISO-2022-JP-KDDI byte sequences the author wanted to map those PUA codepoints to, and anyways, this use case is so obscure that there is little point in worrying about it. However, it is better to remove the non-functioning code than to leave it in. This means that if now one tries to convert PUA codepoints to ISO-2022-JP-KDDI, those codepoints will be treated as erroneous rather than silently ignored.	2021-08-30 16:29:58 +02:00
Alex Dowad	51b9d7a5e1	Test behavior of 'long' illegal character markers After mb_substitute_character("long"), mbstring will respond to erroneous input by inserting 'long' error markers into the output. Depending on the situation, these error markers will either look like BAD+XXXX (for general bad input), U+XXXX (when the input is OK, but it converts to Unicode codepoints which cannot be represented in the output encoding), or an encoding-specific marker like JISX+XXXX or W932+XXXX. We have almost no tests for this feature. Add a bunch of tests to ensure that all our legacy encoding handlers work in a reasonable way when 'long' error markers are enabled.	2021-08-30 16:29:58 +02:00
Alex Dowad	f6f0506c84	Correct comment in mbfilter_ucs4.c	2021-08-30 16:29:58 +02:00
Alex Dowad	03392ecd50	Simplify code for converting UHC to Unicode	2021-08-30 16:29:58 +02:00
Alex Dowad	9363b0b5a7	Declare ARMSCII-8 conversion functions as 'static'	2021-08-30 16:29:58 +02:00
Alex Dowad	97b7fc893c	Output illegal character marker for 4-byte illegal characters > 0x7FFFFFFF Some text encodings supported by mbstring (such as UCS-4) accept 4-byte characters. When mbstring encounters an illegal byte sequence for the encoding it is using, it should emit an 'illegal character' marker, which can either be a single character like '?', an HTML hexadecimal entity, or a marker string like 'BAD+XXXX'. Because of the use of signed integers to hold 4-byte characters, illegal 4-byte sequences with a 'negative' value (one with the high bit set) were not handled correctly when emitting the illegal char marker. The result is that such illegal sequences were just skipped over (and the marker was not emitted to the output). Fix that.	2021-08-30 16:29:58 +02:00
Dmitry Stogov	7690fa0bd8	JIT: Better code for ADD/SUB/MUL and references in tracing JIT.	2021-08-30 17:02:35 +03:00
Máté Kocsis	c19e4b9997	Generate optimizer func info from stubs for ext/standard - part 3 (#7426 )	2021-08-30 15:56:47 +02:00
Máté Kocsis	1bf1481a2a	Specify a few array func info entries (#7425 )	2021-08-30 14:29:18 +02:00
Máté Kocsis	d5b583a61c	Merge branch 'PHP-8.0' * PHP-8.0: Use camelCase method names in OCICollection and OCILob	2021-08-30 14:09:24 +02:00
Máté Kocsis	e94731f164	Use camelCase method names in OCICollection and OCILob (#7405 )	2021-08-30 14:01:12 +02:00
Dmitry Stogov	8f601be101	JIT: Allow keeping result of FETCH_CONSTANT in a CPU register	2021-08-30 14:56:51 +03:00
Máté Kocsis	8e6e9838b0	Add support for generating MAY_BE_ARRAY_OF_REF func info flag (#7416 )	2021-08-30 13:50:34 +02:00
Dmitry Stogov	96c3465513	JIT: Avoid useless EX(func) load	2021-08-30 13:58:23 +03:00
Dmitry Stogov	608d568686	JIT: Avoid reloading of EX(run_time_cache)	2021-08-30 13:19:04 +03:00
Dmitry Stogov	3565d02c6d	JIT: Eliminate load of op_array->run_time_cache__ptr and use immediate value for immutable op_arrays if it's known at compile time	2021-08-30 12:26:37 +03:00
Nikita Popov	d16992afe2	Use HAVE_SYS_PARAM_H	2021-08-30 11:21:33 +02:00
David CARLIER	59255bffbb	Enable getrandom() api on solaris-ish systems (#7417 ) Been available long enough to be trustable source.	2021-08-30 11:17:06 +02:00
Nikita Popov	634f2e21d3	Don't expose wchar encoding to users (#7415 ) The "wchar" encoding isn't really an encoding -- it's what we internally use as the representation of decoded characters. In practice, it tends to behave a lot like the 8bit encoding when used from userland, because input code units end up being treated as code points. This patch removes the wchar encoding from the public encoding list and reserves it for internal use only.	2021-08-30 11:11:33 +02:00
Nikita Popov	0f7e0cf34b	str_replace() can return the original string	2021-08-30 10:23:09 +02:00

1 2 3 4 5 ...

59986 Commits