archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-04-26 09:28:21 +02:00

Author	SHA1	Message	Date
Nikita Popov	c98714f19e	Merge branch 'PHP-7.2'	2017-08-03 21:57:35 +02:00
Nikita Popov	fb9bf5b64b	Revert/fix substitution character fallback The introduced checks were not correct in two respects: * It was checked whether the source encoding of the string matches the internal encoding, while the actually relevant encoding is the target encoding. * Even if the correct encoding is used, the checks are still too conservative. Just because something is not a "Unicode-encoding" does not mean that it does not map any non-ASCII characters. I've reverted the added checks and instead adjusted mbfl_convert to first try to use the provided substitution character and if that fails, perform the fallback to '?' at that point. This means that any codepoint mapped in the target encoding should now be correctly supported and anything else should fall back to '?'.	2017-08-03 21:53:59 +02:00
Nikita Popov	3d948d77d1	Merge branch 'PHP-7.2'	2017-08-03 21:17:26 +02:00
Nikita Popov	a8a9e93e9a	Revert/fix mb_substitute_character() codepoint checks The introduced checks did not treat "non-Unicode" encodings correctly, because they treated the passed integer as encoded in the internal encoding in that case, while in actuality the substitute character is always a Unicode codepoint. Additionally checking the codepoint against the internal encoding is not correct in any case, because the substitution character must be mapped in the target encoding of the conversion, which does not necessarily coincide with the internal encoding (the internal encoding is the default source encoding, not target encoding). This reverts the checks back to simple range checks, but in a way that still resolves #69079: Characters outside the Basic Multilingual Plane are now accepted and Surrogate Codepoints are rejected. A distinction between UTF-8 and non-UTF-8 encodings is not made for surrogate checks (as in the original patch), as surrogates are always illegal on their own. Specifying a surrogate as substitution character would only make sense if you could specify a substitution string with more than one character -- however we do not support that.	2017-08-03 21:12:41 +02:00
Nikita Popov	94fe629992	Merge branch 'PHP-7.2'	2017-08-02 18:11:17 +02:00
Nikita Popov	91240073ea	Merge branch 'PHP-7.1' into PHP-7.2	2017-08-02 18:11:12 +02:00
Nikita Popov	63607375f5	Merge branch 'PHP-7.0' into PHP-7.1	2017-08-02 18:09:09 +02:00
Fabien Villepinte	2cc1cbf2f4	Fix Bug #75001 : Wrong reflection on mb_eregi_replace	2017-08-02 18:08:42 +02:00
Anatol Belski	f9c3ee9ae8	fix c89 compat	2017-07-28 22:18:51 +02:00
Nikita Popov	f4a1d9c821	Fixed bug #65544 and #71298	2017-07-28 14:57:08 +02:00
Nikita Popov	25b6e68432	Merge branch 'PHP-7.2'	2017-07-28 13:03:35 +02:00
Nikita Popov	5d777e56e2	Merge branch 'PHP-7.1' into PHP-7.2	2017-07-28 13:03:26 +02:00
Nikita Popov	c48c638aeb	Merge branch 'PHP-7.0' into PHP-7.1	2017-07-28 13:03:02 +02:00
Nikita Popov	e3d25e78eb	Fixed bug #62934	2017-07-28 13:02:25 +02:00
Nikita Popov	582a65b06f	Implement full case mapping Implement full case mapping according to SpecialCasing.txt and also full case folding according to CaseFolding.txt (F). There are a number of caveats: * Only language-agnostic and unconditional full case mapping is implemented. The only language-agnostic conditional case mapping rule relates to Greek sigma in final position (Final_Sigma). Correctly handling this requires both arbitrary lookahead and lookbehind, which would require some larger changes to how the case mapping is implemented. This is a possible future extension. * The only language-specific handling that is implemented is for Turkish dotted/undotted Is, if the ISO-8859-9 encoding is used. This matches the previous behavior and makes sure that no codepoints not supported by the encoding are produced. A future extension would be to also handle the Turkish mappings specified by SpecialCasing.txt based on the mbfl internal language. * Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string. * mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are: * MB_CASE_LOWER (used by mb_strtolower) * MB_CASE_UPPER (used by mb_strtolower) * MB_CASE_TITLE * MB_CASE_FOLD * MB_CASE_LOWER_SIMPLE * MB_CASE_UPPER_SIMPLE * MB_CASE_TITLE_SIMPLE * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)	2017-07-28 12:32:50 +02:00
Nikita Popov	9ac7c1e71d	Use case-folding for case insensitive comparisons Instead of using lowercasing.	2017-07-28 12:32:50 +02:00
Nikita Popov	80a0601fe5	Use MPH for case maps Instead of performing a binary search, use a hashtable to store the case maps. In particular a minimal perfect hash construction is used, which does not require collision resolution (but does use an auxiliary table for the hash perturbation).	2017-07-28 12:32:50 +02:00
Nikita Popov	f56b0afe6e	Avoid some unnecessary mbfl_strlen() calculations	2017-07-28 12:32:50 +02:00
Nikita Popov	eacd70f762	Don't store titlecase if same as uppercase The totitle code already has a fallback for that case.	2017-07-28 12:32:50 +02:00
Nikita Popov	cedfc2f426	Drop implementation-specific character properties No point in keeping around non-standard character properties if we're not using them and most are not even being populated.	2017-07-28 12:32:50 +02:00
Anatol Belski	98fe82cc05	fix data types	2017-07-25 21:26:25 +02:00
Anatol Belski	13a2629005	size_t fixes	2017-07-25 19:03:33 +02:00
Nikita Popov	8ace7045e9	Handle character ranges in ucgendat generically In particular, the previous implementation did not account for Tangut Ideographs and CJK Ideograph extensions C through F.	2017-07-25 18:48:12 +02:00
Nikita Popov	0c0e35fedc	Port ucgendat to PHP Implemented such that the output is identical, including some quirks that should be fixed subsequently.	2017-07-25 18:48:12 +02:00
Nikita Popov	4bd61ec7ad	Fix handling of some special ranges in ucgendat * Han Ideagraphs go up to U+9FEA. * CJK Compatibility Ideographs are no longer specified as a special range in remotely recent versions of Unicode. * Surrogate properties should be assigned to U+D800-U+DFFF, not to U+10000-U+1FFFF.	2017-07-25 18:48:12 +02:00
Nikita Popov	445e13b149	Add MBFL_SUBSTR_TO_END mode to mbfl_substr This takes the substr from the offset to the end of the string. This avoids pointless searching for the end position and also saves us a length calculation in the strstr family of functions.	2017-07-23 23:17:12 +02:00
Nikita Popov	bff11c382e	Remove more obsolete length checks	2017-07-23 19:09:36 +02:00
Nikita Popov	3c6b2512cb	Change layout of case mapping table Previously the case mapping table was segregated by the type of the character (upper, lower, title) and always stored the other two variants (key, other1, other2). Now the table is segregated by the target type (key, other). As only very few characters have more than one target this only slightly increases the size of the table. The advantage of this layout is that we only need to perform a single table lookup in the case table. Previously, depending on the case that was hit, either one lookup in the property table, or two lookups in the property table and one lookup in the case table were required. This changes the layout from libunicode in the OpenLDAP project -- however, the last commit there was over 10 years ago, so I don't see value in keeping this in sync.	2017-07-23 18:33:15 +02:00
Anatol Belski	78944bdfc6	remove cast	2017-07-23 17:38:28 +02:00
Anatol Belski	6809be2090	fix warnings and datatype ident	2017-07-23 17:36:10 +02:00
Anatol Belski	7496bad2ac	adjust datatype, used for position handling	2017-07-23 16:37:31 +02:00
Anatol Belski	ea83b69883	Adjust datatypes and reorder which saves 8 bytes on 64-bit	2017-07-23 16:37:30 +02:00
Nikita Popov	fe8384fdfd	Merge branch 'PHP-7.2'	2017-07-23 16:06:25 +02:00
Nikita Popov	706f0cf8a0	Update Unicode data for Unicode 10	2017-07-23 16:05:39 +02:00
Nikita Popov	24cfbfd56f	Update ucgendat for more bidi properties Handle them the same way as others -- by classifying as Other Neutral.	2017-07-23 16:03:11 +02:00
Nikita Popov	7077c719db	Merge branch 'PHP-7.2'	2017-07-23 15:36:25 +02:00
Nikita Popov	077e61fad3	Fixed bug #69267 completely ucgendat.c was assuming that a title-case character is a character that has both lower and upper-case variants. However, there are title-case characters that only have a lower-case variant. Use the Lt general character proprety to determine where in the case map the character should be placed instead.	2017-07-23 15:30:17 +02:00
Nikita Popov	c0bcd301d3	Another fix for bug #69267 mb_strtoupper() was converting lowercase characters into titlecase characters, instead of uppercase characters. Luckily there are only very few characters with a distinct titlecase representation, so this mostly worked out okay...	2017-07-23 15:07:02 +02:00
Nikita Popov	0e4af9192f	Partial fix for bug #69267 This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the OpenLDAP project.	2017-07-23 14:47:21 +02:00
Nikita Popov	698132d6f9	Merge branch 'PHP-7.2'	2017-07-23 12:22:09 +02:00
Nikita Popov	88f752a947	Merge branch 'PHP-7.1' into PHP-7.2	2017-07-23 12:21:51 +02:00
Nikita Popov	f116a88592	Merge branch 'PHP-7.0' into PHP-7.1	2017-07-23 12:21:16 +02:00
Christoph M. Becker	418da85f15	Fix #71606 : Segmentation fault mb_strcut with HTML-ENTITIES The HTML decoding filter uses the `opaque` member of mbfl_convert_filter as buffer, but there was no copy constructor defined, what caused double frees when the filter is copied (what happens multiple times in mb_strcut(), for instance).	2017-07-23 12:19:27 +02:00
Nikita Popov	b8ed74ce77	Merge branch 'PHP-7.2'	2017-07-23 11:55:46 +02:00
Nikita Popov	42ff1aa86c	Fix overflow checks in mbfl_memory_device Also prune out some duplicate code and use strlen() and memcpy() instead of ad-hoc reimplementations. Remove multiplications by sizeof(unsigned char), which wrongly imply that this can be anything but 1.	2017-07-23 11:55:43 +02:00
Nikita Popov	bd63c0f5b3	Fix bug #73528	2017-07-23 11:55:43 +02:00
Nikita Popov	80463579ce	Remove confusing null checks in mb_send_mail These are required parameters, they cannot be missing.	2017-07-23 11:55:43 +02:00
Nikita Popov	9af5b7f33d	Fix use after free in mb_send_mail	2017-07-23 11:55:26 +02:00
Anatol Belski	4fbd7ccba2	touch yet more places for datatypes	2017-07-23 00:47:24 +02:00
Anatol Belski	0eea41b6c4	add missing header	2017-07-23 00:23:02 +02:00

1 2 3 4 5 ...

1354 Commits