archived-php-src

mirror of https://github.com/php/php-src.git synced 2026-03-31 12:42:29 +02:00

Author	SHA1	Message	Date
Alex Dowad	b1ab76f742	Minor formatting tweaks in mbfilter_euc_kr.c	2021-06-17 13:12:40 +02:00
Alex Dowad	958ef47d2b	When flushing CP5022x conversion filter, also flush next filter in chain All the mbstring encoding conversion filters do this. I missed it when adding a flush function for CP5022x.	2021-06-17 13:12:40 +02:00
Alex Dowad	caeaa662ab	Strict conversion of UHC text to Unicode Previously, mbstring would accept a lot of things which were not valid UHC text. No more. - Don't allow single-byte control characters to appear where the 2nd byte of a multi-byte character should be. - Validate that the 2nd byte of a multi-byte character is in the expected range. - Treat it as an error if a multi-byte character is truncated. Also add a test suite to confirm that UHC conversion (both to and from Unicode) works according to spec.	2021-06-17 13:12:40 +02:00
Alex Dowad	4550036d96	Minor formatting tweaks in mbfilter_uhc.c	2021-06-17 13:12:40 +02:00
Alex Dowad	9868c17368	Mark CP932 and CP51932 encoding tests as 'slow tests'	2021-06-17 13:12:40 +02:00
Alex Dowad	e2459857af	Remove duplicate implementation of CP932 from mbstring Sigh. Double sigh. After fruitlessly searching the Internet for information on this mysterious text encoding called "SJIS-open", I wrote a script to try converting every Unicode codepoint from 0-0xFFFF and compare the results from different variants of Shift-JIS, to see which one "SJIS-open" would be most similar to. The result? It's just CP932. There is no difference at all. So why do we have two implementations of CP932 in mbstring? In case somebody, somewhere is using "SJIS-open" (or its aliases "SJIS-win" or "SJIS-ms"), add these as aliases to CP932 so existing code will continue to work.	2021-06-17 13:12:40 +02:00
Alex Dowad	7502c86342	Add test suite for UTF-{7,8,16,32} Also fix a couple small problems with UTF-32 and UTF-8 support: - UTF-32 would pass very large codepoints (>= 0x80000000), which are not valid. - UTF-8 would sometimes emit two error marker characters for a single bad input byte.	2021-06-17 13:12:40 +02:00
Nikita Popov	a06d015e61	Remove unnecessary mbstring skipifs These functions are always available (if the extension is available at all).	2021-06-14 15:27:28 +02:00
Nikita Popov	6600ad6067	Add some missing EXTENSIONS sections to misc tests	2021-06-14 14:52:44 +02:00
Nikita Popov	4083600bd5	Port mbstring to use EXTENSIONS	2021-06-11 14:00:43 +02:00
Nikita Popov	39131219e8	Migrate more SKIPIF -> EXTENSIONS (#7139 ) This is a mix of more automated and manual migration. It should remove all applicable extension_loaded() checks outside of skipif.inc files.	2021-06-11 12:58:44 +02:00
Nikita Popov	7485978339	Migrate SKIPIF -> EXTENSIONS (#7138 ) This is an automated migration of most SKIPIF extension_loaded checks.	2021-06-11 11:57:42 +02:00
Ayesh Karunaratne	b8e380ab09	Update deprecation message for incompatible float to int conversion Updates the deprecation message for implicit incompatible float to int conversion from: ``` Implicit conversion from non-compatible float %.H to int in %s on line %d ``` to ``` Implicit conversion from float %.H to int loses precision in %s on line %d ``` Related: #6661	2021-06-07 14:36:11 +02:00
George Peter Banyard	b6958bb847	Implement "Deprecate implicit non-integer-compatible float to int conversions" RFC. (#6661 ) RFC: https://wiki.php.net/rfc/implicit-float-int-deprecate Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>	2021-05-31 15:48:45 +01:00
George Peter Banyard	e7135cb817	Use zend_string_equals_* API in a couple of more place Closes GH-6979	2021-05-14 13:45:17 +01:00
George Peter Banyard	aca6aefd85	Remove 'register' type qualifier (#6980 ) The compiler should be smart enough to optimize this on its own	2021-05-14 13:38:01 +01:00
George Peter Banyard	c40231afbf	Mark various functions with void arguments. This fixes a bunch of [-Wstrict-prototypes] warning, because in C func() and func(void) have different semantics.	2021-05-12 14:55:53 +01:00
KsaR	01b3fc03c3	Update http->https in license (#6945 ) 1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https. 2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier". 3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted. 4. fixed indentation in some files before \|	2021-05-06 12:16:35 +02:00
Christoph M. Becker	592cfa309e	Merge branch 'PHP-8.0' * PHP-8.0: Fix #81011: mb_convert_encoding removes references from arrays	2021-05-04 18:40:23 +02:00
Christoph M. Becker	d1c0cbdcb1	Merge branch 'PHP-7.4' into PHP-8.0 * PHP-7.4: Fix #81011: mb_convert_encoding removes references from arrays	2021-05-04 18:39:39 +02:00
Christoph M. Becker	0cafd53d18	Fix #81011 : mb_convert_encoding removes references from arrays We need to dereference references. Closes GH-6938.	2021-05-04 18:37:40 +02:00
Alex Dowad	7159907d30	Fix mbstring support for ISO-2022-JP-MS encoding - Treat it as error if multi-byte string or escape sequence is truncated - Don't allow 'control' characters or escape sequences to appear in the middle of a multi-byte char As with ISO-2022-JP-KDDI, the main reference used to develop the tests was the behavior of the existing code. It would have been better to have some independent reference which we could cross-check our code against, but I couldn't find one.	2021-04-15 15:52:31 +02:00
Alex Dowad	570e89a9f3	Fix mbstring support for ISO-2022-JP-KDDI encoding - Treat it as an error if a multi-byte character or escape sequence is truncated - When converting other encodings to ISO-2022-JP-KDDI, don't swallow trailing hash characters or digits - Don't allow 'control' characters to appear in the middle of a multi-byte char Note: I was not able to find any kind of official or even semi-official specification for this legacy encoding. Therefore, the test suite for ISO-2022-JP-KDDI is based largely on the behavior of the existing code. Verifying the correctness of program code in this way is very questionable. In a sense, all you are proving is that the code "does what it does". However, the test suite will still expose any unintended _changes_ to behavior.	2021-04-15 15:52:31 +02:00
Alex Dowad	f5f3ee7aee	Add test suite for mUTF-7 (IMAP) encoding	2021-04-15 15:52:31 +02:00
Alex Dowad	78dc160e3b	Catch and handle errors in mUTF-7 (IMAP) conversion	2021-04-15 15:52:31 +02:00
Alex Dowad	cef4b94eef	Code cleanup in mbfilter_utf7imap.c	2021-04-15 15:52:31 +02:00
Alex Dowad	8abc5e6827	Catch and handle errors in UTF-7 text conversion	2021-04-15 15:52:31 +02:00
Alex Dowad	689978a63b	Code cleanup in mbfilter_utf7.c	2021-04-15 15:52:31 +02:00
Alex Dowad	ebe6500a0b	Fix error reporting bug for Unicode -> CP50220 conversion To detect errors in conversion from Unicode to another text encoding, each mbstring conversion filter object maintains a count of 'bad' characters. After a conversion operation finishes, this count is checked to see if there was any error. The problem with CP50220 was that mbstring used a chain of two conversion filter objects. The 'bad character count' would be incremented on the second object in the chain, but this didn't do anything, as only the count on the first such object is ever checked. Fix this by implementing the conversion using a single conversion filter object, rather than a chain of two. This is possible because of the recent refactoring, which pulled out the needed logic for CP50220 conversion into a helper function.	2021-04-15 15:52:31 +02:00
Alex Dowad	1f130d4e58	Refactor mbfl_filt_tl_jisx0201_jisx0208 by moving kana conversion into helper function This will enable us to simplify the code for CP50220 conversion, which also relies on this same kana conversion logic.	2021-04-15 15:52:31 +02:00
Alex Dowad	319a340843	Simplify code for working with halfwidth/fullwidth kana conversion filter There's no need to dynamically allocate a struct to hold the 'mode' parameter; just store it directly in `filt->opaque`. Some other things were also being done in an unnecessarily roundabout way. Also, the 'copy' function for CP50220 conversion filters was both broken and unnecessary. Broken, because it malloc'd memory which was never freed by anything. Unnecessary, because the point of the copy is so that various algorithms can try running bytes through a conversion filter and see how many output bytes or characters result, and then back out by restoring the filters to their previous state. But here's the thing; CP50220 conversion filters don't hold cached bytes, which is the main thing which would need to be restored to a previous state.	2021-04-15 15:52:31 +02:00
Alex Dowad	a900ec3397	Remove unneeded 'filter_ctor' member from mbfl_convert_filter struct This function pointer is only called when initializing the struct. After that nothing is done with it. Therefore, there is no need to keep it in the struct.	2021-04-15 15:52:31 +02:00
Alex Dowad	affc3076f3	Remove unused 'next_filter' member from mbfl_filt_tl_jisx0201_jisx0208_param struct	2021-04-15 15:52:31 +02:00
Alex Dowad	636251a522	Remove useless function mbfl_filt_tl_jisx0201_jisx0208_init This constructor function doesn't do anything different than the generic one. There's no need to invoke it, either, when initializing a CP50220 conversion filter.	2021-04-15 15:52:31 +02:00
George Peter Banyard	09efad615b	Use zend_string_equals_(literal_)ci() API more often Also drive-by usage of zend_ini_parse_bool() Closes GH-6844	2021-04-09 02:34:50 +01:00
George Peter Banyard	5caaf40b43	Introduce pseudo-keyword ZEND_FALLTHROUGH And use it instead of comments	2021-04-07 00:46:29 +01:00
Máté Kocsis	cad66533f0	Generate class entries from stubs for ldap, libxml, mbstring and mysqli Closes GH-6684	2021-02-16 14:46:19 +01:00
Max Semenik	b11771271e	Remove stray mentions of mbstring.func_overload This feature has been completely removed. Closes GH-6688.	2021-02-15 09:47:28 +01:00
Nikita Popov	b10416a652	Deprecate passing null to non-nullable arg of internal function This deprecates passing null to non-nullable scale arguments of internal functions, with the eventual goal of making the behavior consistent with userland functions, where null is never accepted for non-nullable arguments. This change is expected to cause quite a lot of fallout. In most cases, calling code should be adjusted to avoid passing null. In some cases, PHP should be adjusted to make some function arguments nullable. I have already fixed a number of functions before landing this, but feel free to file a bug if you encounter a function that doesn't accept null, but probably should. (The rule of thumb for this to be applicable is that the function must have special behavior for 0 or "", which is distinct from the natural behavior of the parameter.) RFC: https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg Closes GH-6475.	2021-02-11 21:46:13 +01:00
Alex Dowad	d8c785b894	Update 'East Asian Width' table to comply with Unicode 13.0 Instead of manually maintaining the data in eaw_table.h, it is now automatically generated by ucgendat/ucgendat.php, using the EastAsianWidth.txt file from the Unicode Consortium. Something must be said about the deleted test case. Back in 2004, someone noticed that `mb_strwidth` didn't comply with Unicode 4.0. A test case was added to expose the problem. Well, time keeps moving on, and with the changing years, new Unicodes are born and old Unicodes die. Some characters which were counted as double-width in Unicode 4.0 are no longer such in Unicode 13.0, which renders the test case obsolete. At the same time, make a couple of spelling/grammar fixes in ucgendat.php.	2021-01-19 20:38:44 +02:00
Alex Dowad	a06c20a17c	Remove useless constant MBFL_ENCTYPE_MBCS This flag indicated that an encoding was 'multi-byte'; it can use a variable number of bytes to encode each character. As it turns out, we don't actually need to check this flag anywhere, so it's better to remove it.	2021-01-15 21:55:41 +02:00
Alex Dowad	6cbeb6476e	Remove unused macros from mbfilter_cp51932.c, mbfilter_iso2022jp_mobile.c	2021-01-15 21:55:41 +02:00
Alex Dowad	34ece40872	Remove useless mbstring encoding 'JIS-ms' MicroSoft invented three encodings very similar to ISO-2022-JP/JIS7/JIS8, called CP50220, CP50221, and CP50222. All three are supported by mbstring. Since these encodings are very similar, some code can be shared. Actually, conversion of CP50220/1/2 to Unicode is exactly the same operation; it's when converting from Unicode to CP50220/1/2 that some small differences arise in how certain katakana are handled. The most important common code was a function called `mbfl_filt_wchar_jis_ms`. The `jis_ms` part doubtless refers to the fact that these encodings are modified versions of 'JIS' invented by 'MS'. mbstring also went a step further and exported 'JIS-ms' to userland as a separate encoding from CP50220/1/2. If users requested 'JIS-ms' conversion, they got something like CP50220/1/2, minus their special ways of handling half-width katakana when converting from Unicode. But... that 'encoding' is not something which actually exists in the world outside of mbstring. CP50220/1/2 do exist in MicroSoft software, but not 'JIS-ms'. For a text encoding conversion library, inventing new variant encodings and implementing them is not very productive. Our interest is in handling text encodings which real people actually use for... you know, storing actual text and things like that.	2021-01-15 21:55:41 +02:00
Alex Dowad	fcbe45de10	Remove useless mbstring encoding 'CP50220-raw' CP50220 is a variant of ISO-2022-JP invented by MicroSoft, which handles some Unicode characters which are not representable in ISO-2022-JP by converting them to similar characters which are representable. What, then, is CP50220-raw? An Internet search turns up absolutely nothing. Reference works which I consulted don't say anything about it. Other text conversion libraries don't support it. From looking at the code: It's just the same as CP50220, but it accepts unmapped JIS X 0208 characters passed through from other Japanese encodings and silently encodes them using the usual ISO-2022-JP escape sequence and representation for JIS X 0208 characters. It's hard to see how this could be useful. OK, let me come out and say it: it's _not_ useful. We can confidently jettison this (mis)feature.	2021-01-15 21:55:41 +02:00
Alex Dowad	888f5d7729	CP5022{0,1,2}: treat truncated multibyte characters as error	2021-01-15 21:55:41 +02:00
Alex Dowad	2a93a8bb8c	Add test suite for CP5022{0,1,2}	2021-01-15 21:55:41 +02:00
Nikita Popov	3e01f5afb1	Replace zend_bool uses with bool We're starting to see a mix between uses of zend_bool and bool. Replace all usages with the standard bool type everywhere. Of course, zend_bool is retained as an alias.	2021-01-15 12:33:06 +01:00
Nikita Popov	e2c8ab7c33	Print "interned" instead of fake refcount in debug_zval_dump() debug_zval_dump() currently prints refcount 1 for interned strings and arrays, which does not really reflect the truth. These values are not refcounted, so the refcount is misleading. Instead print an "interned" tag. Closes GH-6598.	2021-01-15 12:21:24 +01:00
Alex Dowad	0ec34da8e0	CP5022{0,1,2}: treat unrecognized escapes as error	2021-01-15 08:30:36 +02:00
Alex Dowad	a50607d11d	CP5022{0,1,2}: use JISX0201 for U+203E (overline) Same issue as `d497c0e96f` addressed for JIS7/JIS8, but for CP5022{0,1,2} this time.	2021-01-15 08:30:30 +02:00

1 2 3 4 5 ...

2059 Commits