archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-02 13:43:02 +02:00

Author	SHA1	Message	Date
Alex Dowad	6dd75478d5	Leading BOM is stripped for UTF-32 For consistency with UTF-16 and UCS-4. Also, do some code cleanup.	2020-11-11 11:18:59 +02:00
Alex Dowad	1cf12c02f0	Add test suite for SJIS-mac encoding	2020-11-11 11:18:58 +02:00
Alex Dowad	d40f9cf735	Add test suite for SJIS-2004 encoding	2020-11-11 11:18:58 +02:00
Alex Dowad	d1d50c2b7a	Test EUC-JP and Shift-JIS more thoroughly Previously, the unit tests for these text encodings covered all mappings from legacy -> Unicode, and all _reversible_ mappings from Unicode -> legacy. However, we should also test the few Unicode -> legacy mappings which are not reversible.	2020-11-11 11:18:58 +02:00
Alex Dowad	3e7acf901d	Remove mbstring identify filters mbstring had an 'identify filter' for almost every supported text encoding which was used when auto-detecting the most likely encoding for a string. It would run over the string and set a 'flag' if it saw anything which did not appear likely to be the encoding in question. One problem with this scheme was that encodings which merely appeared less likely to be the correct one were completely rejected, even if there was no better candidate. Another problem was that the 'identify filters' had a huge amount of code duplication with the 'conversion filters'. Eliminate the identify filters. Instead, when auto-detecting text encoding, use conversion filters to see whether the input string is valid in candidate encodings or not. At the same type, watch the type of codepoints which the string decodes to and mark it as less likely if non-printable characters (ESC, form feed, bell, etc.) or 'private use area' codepoints are seen. Interestingly, one old test case in which JIS text was misidentified as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed' and the JIS string is now auto-detected as JIS.	2020-11-09 13:45:17 +02:00
Alex Dowad	8f6889b20d	Fix mbstring support for EUC-JP text encoding - Don't allow control characters to appear in the middle of a multi-byte character. (A strange feature, or perhaps misfeature, of mbstring which is not present in other libraries such as iconv.) - When checking whether string is valid, reject kuten codes which do not map to any character, whether converting from EUC-JP to another encoding, or converting another encoding which uses JIS X 0208/0212 charsets to EUC-JP. - Truncated multi-byte characters are treated as an error.	2020-11-09 13:45:17 +02:00
Alex Dowad	ad7e0f16cc	Fix mbstring support for Shift-JIS - Reject otherwise valid kuten codes which don't map to anything in JIS X 0208. - Handle truncated multi-byte characters as an error. - Convert Shift-JIS 0x7E to Unicode 0x203E (overline) as recommended by the Unicode Consortium, and as iconv does. - Convert Shift-JIS 0x5C to Unicode 0xA5 (yen sign) as recommended by the Unicode Consortium, and as iconv does. (NOTE: This will affect PHP scripts which use an internal encoding of Shift-JIS! PHP assigns a special meaning to 0x5C, the backslash. For example, it is used for escapes in double-quoted strings. Mapping the Shift-JIS yen sign to the Unicode yen sign means the yen sign will not be usable for C escapes in double-quoted strings. Japanese PHP programmers who want to write their source code in Shift-JIS for some strange reason will have to use the JIS X 0208 backlash or 'REVERSE SOLIDUS' character for their C escapes.) - Convert Unicode 0x5C (backslash) to Shift-JIS 0x815F (reverse solidus). - Immediately handle error if first Shift-JIS byte is over 0xEF, rather than waiting to see the next byte. (Previously, the value used was 0xFC, which is the limit for the 2nd byte and not the 1st byte of a multi-byte character.) - Don't allow 'control characters' to appear in the middle of a multi-byte character. The test case for bug 47399 is now obsolete. That test assumed that a number of Shift-JIS byte sequences which don't map to any character were 'valid' (because the byte values were within the legal ranges).	2020-11-09 13:45:16 +02:00
Alex Dowad	cc03c54c36	Remove useless byte{2,4}{be,le} encodings from mbstring There is no meaningful difference between these and UCS-{2,4}. They are just a little bit more lax about passing errors silently. They also have no known use. Alias to UCS-{2,4} in case someone, somewhere is using them.	2020-11-09 13:45:16 +02:00
Alex Dowad	3eb8828d1a	Fix issues with mbstring encoding tests I made some mistakes on this code, which meant that not everything which should be tested was actually being tested.	2020-11-09 13:45:16 +02:00
Alex Dowad	ff953f254c	Add test suite for ARMSCII-8 encoding	2020-11-02 21:31:06 +02:00
Alex Dowad	335c1b98c2	Add test suite for KOI8-U encoding	2020-11-02 21:31:06 +02:00
Alex Dowad	9db4387f14	Add test suite for KOI8-R encoding	2020-11-02 21:31:06 +02:00
Alex Dowad	9980534a4e	Add test suite for CP850 encoding	2020-11-02 21:31:06 +02:00
Alex Dowad	0485bed4c7	Add test suite for CP866 encoding	2020-11-02 21:31:06 +02:00
Alex Dowad	0b13305ccc	Add test suite for CP1254 encoding	2020-11-02 21:31:05 +02:00
Alex Dowad	eb4151e89e	Add test suite for CP1251 encoding	2020-11-02 21:31:05 +02:00
Alex Dowad	b18b9c9ef6	Test cases for mbstring encodings are less repetitive	2020-11-02 21:31:05 +02:00
Alex Dowad	831abe2d90	Add test suite for CP1252 encoding Also remove a bogus test (bug62545.phpt) which wrongly assumed that all invalid characters in CP1251 and CP1252 should map to Unicode 0xFFFD (REPLACEMENT CHARACTER). mbstring has an interface to specify what invalid characters should be replaced with; it's called `mb_substitute_character`. If a user wants to see the Unicode 'replacement character', they can specify that using `mb_substitute_character`. But if they specify something else, we should follow that.	2020-10-30 22:13:27 +02:00
Alex Dowad	84c180d88b	Add test suite for ISO-8859-x encoding verification and conversion	2020-10-16 22:25:48 +02:00
Nikita Popov	bd2488bc49	Merge branch 'PHP-8.0' * PHP-8.0: Normalize mb_ereg() return value	2020-10-13 20:41:33 +02:00
Nikita Popov	5582490bf2	Normalize mb_ereg() return value mb_ereg()/mb_eregi() currently have an inconsistent return value based on whether the $matches parameter is passed or not: > Returns the byte length of the matched string if a match for > pattern was found in string, or FALSE if no matches were found > or an error occurred. > > If the optional parameter regs was not passed or the length of > the matched string is 0, this function returns 1. Coupling this behavior to the $matches parameter doesn't make sense -- we know the match length either way, there is no technical reason to distinguish them. However, returning the match length is not particularly useful either, especially due to the need to convert 0-length into 1-length to satisfy "truthy" checks. We could always return 1, which would kind of match the behavior of preg_match() -- however, preg_match() actually returns the number of matches, which is 0 or 1 for preg_match(), while false signals an error. However, mb_ereg() returns false both for no match and for an error. This would result in an odd 1\|false return value. The patch canonicalizes mb_ereg() to always return a boolean, where true indicates a match and false indicates no match or error. This also matches the behavior of the mb_ereg_match() and mb_ereg_search() functions. This fixes the default value integrity violation in PHP 8. Closes GH-6331.	2020-10-13 20:40:55 +02:00
Alex Dowad	97beecc251	Add identify filter for UTF-16, UTF-16LE, UTF-16BE There was one faulty test in the suite which only passed before because UTF-16 had no identify filter. After this was fixed, it exposed the problem with the test.	2020-10-13 20:26:13 +02:00
Nikita Popov	cafceea742	Update mbstring parameter names Closes GH-6207.	2020-09-28 09:51:58 +02:00
Larry Garfield	94854e0dff	Standardize mbstring and string on using 'string' as a parameter name. Closes GH-6171.	2020-09-21 12:06:50 +02:00
Máté Kocsis	e950ca13ea	Consolidate the usage of "either" and "one of" in error messages Closes GH-6173	2020-09-20 19:41:47 +02:00
Nikita Popov	c5401854fc	Run tidy This should fix most of the remaining issues with tabs and spaces being mixed in tests.	2020-09-18 14:28:32 +02:00
Máté Kocsis	c37a1cd650	Promote a few remaining errors in ext/standard Closes GH-6110	2020-09-15 14:26:16 +02:00
Nikita Popov	f33fd9b7fe	Throw ValueError on null bytes in mb_send_mail() Instead of silently replacing with spaces.	2020-09-11 10:46:59 +02:00
George Peter Banyard	0444158529	Promote some warnings in MBString Regexes Closes GH-5341	2020-09-09 14:55:07 +02:00
Nikita Popov	623bf96e7e	Throw on invalid mb_http_input() type	2020-09-07 09:59:51 +02:00
Nikita Popov	d57f9e5ea4	Handle null encoding in mb_http_input()	2020-09-04 17:15:35 +02:00
Alex Dowad	73dcfb6faa	Fix typos in mbstring tests Man, I can be pedantic sometimes. Tiny little things like misspelled words just hurt me inside. So while it's not really a big deal, I couldn't leave these typos alone...	2020-09-02 20:48:22 +02:00
Alex Dowad	dc98c1346d	Additional tests for mbstring extension	2020-08-31 23:15:57 +02:00
Máté Kocsis	7aacc705d0	Add many missing closing PHP tags to tests Closes GH-5958	2020-08-09 22:03:36 +02:00
Nikita Popov	52047addc7	Only force log startup errors if display_startup_errors disabled Otherwise this results in duplicate errors. Closes GH-5941.	2020-08-05 18:17:00 +02:00
Nikita Popov	d65d3f5298	Fix bug #79108 Don't expose references in debug_backtrace() or exception traces. This is regardless of whether the argument is by-reference or not. As a side-effect of this change, exception traces may now acquire the interior value of a reference, which may be unexpected for some internal functions. This is what necessitated the change in the spl_array sort implementation.	2020-07-24 12:23:34 +02:00
Máté Kocsis	d30cd7d7e7	Review the usage of apostrophes in error messages Closes GH-5590	2020-07-10 21:05:28 +02:00
Nikita Popov	0e71446e7a	Merge branch 'PHP-7.4' * PHP-7.4: Fix bug #79787	2020-07-08 11:22:47 +02:00
Nikita Popov	77a8a709da	Merge branch 'PHP-7.3' into PHP-7.4 * PHP-7.3: Fix bug #79787	2020-07-08 11:22:18 +02:00
XXiang	3d5de7d746	Fix bug #79787 Closes GH-5807.	2020-07-08 11:20:58 +02:00
Fabien Villepinte	0c6d06ecfa	Replace EXPECTF when possible Closes GH-5779	2020-06-29 21:31:44 +02:00
Máté Kocsis	b5c7a83dca	Remove unnecessary PHPDoc-alike blocks from tests Closes GH-5759	2020-06-24 13:13:44 +02:00
Nikita Popov	5aa649cf51	Merge branch 'PHP-7.4'	2020-06-17 09:35:19 +02:00
Nikita Popov	3d6199db8a	Add mbregex skipif	2020-06-17 09:35:02 +02:00
Nikita Popov	bbe74a6e3a	Merge branch 'PHP-7.4'	2020-06-16 14:32:33 +02:00
Nikita Popov	3f2f36d5d4	Fix non-default syntax in mb_ereg_search()	2020-06-16 14:31:29 +02:00
Máté Kocsis	fbe30592d6	Improve type error messages when an object is given From now on, we always display the given object's type instead of just reporting "object". Additionally, make the format of return type errors match the format of argument errors. Closes GH-5625	2020-05-26 19:06:19 +02:00
George Peter Banyard	7dd332f110	Refactor mb_substitute_character() Using the new Fast ZPP API for string\|int\|null This also fixes Bug #79448 which was too disruptive to fix in PHP 7.x	2020-05-11 17:30:01 +02:00
Nikita Popov	d38f819647	Fix test file encoding The mb_http_input_pass.phpt was intended to use the same encoding as mb_http_input.phpt, not UTF-8.	2020-05-07 21:18:49 +02:00
Nikita Popov	481b7421f3	Throw warning if invalid internal_encoding ini is specified	2020-05-07 14:44:13 +02:00

1 2 3 4 5 ...

625 Commits