1
0
mirror of https://github.com/php/php-src.git synced 2026-04-17 21:11:02 +02:00
Commit Graph

17 Commits

Author SHA1 Message Date
Alex Dowad
edc6b756c1 Merge branch 'PHP-8.1'
* PHP-8.1:
  mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint
2021-12-20 22:47:18 +02:00
Alex Dowad
f07c193583 mb_convert_encoding will not auto-detect input string as UUEncode, Base64, QPrint
In a2bc57e0e5, mb_detect_encoding was modified to ensure it would never
return 'UUENCODE', 'QPrint', or other non-encodings as the "detected
text encoding". Before mb_detect_encoding was enhanced so that it could
detect any supported text encoding, those were never returned, and they
are not desired. Actually, we want to eventually remove them completely
from mbstring, since PHP already contains other implementations of
UUEncode, QPrint, Base64, and HTML entities.

For more clarity on why we need to suppress UUEncode, etc. from being
detected by mb_detect_encoding, the existing UUEncode implementation
in mbstring *never* treats any input as erroneous. It just accepts
everything. This means that it would *always* be treated as a valid
choice by mb_detect_encoding, and would be returned in many, many cases
where the input is obviously not UUEncoded.

It turns out that the form of mb_convert_encoding where the user passes
multiple candidate encodings (and mbstring auto-detects which one to
use) was also affected by the same issue. Apply the same fix.
2021-12-20 22:09:33 +02:00
Alex Dowad
9308974f8c Deprecate use of mbstring to convert text to Base64/QPrint/HTML entities/etc
The purpose of mbstring is for working with Unicode and legacy text
encodings; but Base64, QPrint, etc. are not text encodings and don't
really belong in mbstring. PHP already contains separate implementations
of Base64, QPrint, and HTML entities. It will be better to eventually
remove these non-encodings from mbstring.

Regarding HTML entities... there is a bit more to say. mbstring's
implementation of HTML entities is different from the other built-in
implementation (htmlspecialchars and htmlentities). Those functions
convert <, >, and & to HTML entities, but mbstring does not.

It appears that the original author of mbstring intended for something
to be done with <, >, and &. He used a table to identify which
characters should be converted to HTML entities, and </>/& all have a
special value in that table. However, nothing ever checks for that
special value, so the characters are passed through unconverted.

This seems like a very useless implementation of HTML entities. The most
important characters which need to be expressed as entities in HTML
documents are those three!
2021-11-01 11:23:21 +02:00
Nikita Popov
39131219e8 Migrate more SKIPIF -> EXTENSIONS (#7139)
This is a mix of more automated and manual migration. It should remove all applicable extension_loaded() checks outside of skipif.inc files.
2021-06-11 12:58:44 +02:00
Alex Dowad
3e7acf901d Remove mbstring identify filters
mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.

One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.

Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.

Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.
2020-11-09 13:45:17 +02:00
Nikita Popov
cafceea742 Update mbstring parameter names
Closes GH-6207.
2020-09-28 09:51:58 +02:00
Alex Dowad
73dcfb6faa Fix typos in mbstring tests
Man, I can be pedantic sometimes. Tiny little things like misspelled words just
hurt me inside. So while it's not really a big deal, I couldn't leave these typos
alone...
2020-09-02 20:48:22 +02:00
George Peter Banyard
90eeca2531 Convert some unknown encoding warnings to ValueErrors in ext/mbstring
Promotes only the warnings where the encoding comes only from a string.
Functions which accept an array of encodings will be fixed at a later stage.

Closes GH-5317
2020-03-31 16:34:18 +02:00
Nikita Popov
852485d8ec Adjust tests for zpp TypeError change 2019-03-11 11:32:20 +01:00
Nikita Popov
4bd18db8cc Remove custom error handler in mbstring tests
To make it more obvious what is tested and what the error messages
are.
2019-03-05 11:41:53 +01:00
Peter Kokot
d679f02295 Sync leading and final newlines in *.phpt sections
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.

According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.

C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."

Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
2018-10-15 04:33:09 +02:00
Peter Kokot
d7a3edd45d Trim trailing whitespace in *.phpt 2018-10-14 19:46:15 +02:00
Moriyoshi Koizumi
85d7751097 - Fix include_path. 2008-07-17 16:30:32 +00:00
Moriyoshi Koizumi
91bf8e5dc9 Explicitly specify mbstring.language. 2003-09-26 14:42:37 +00:00
Moriyoshi Koizumi
408e019b25 Disabled output_handler in INI section 2002-11-03 08:37:59 +00:00
Moriyoshi Koizumi
a705a8b597 Clean up. 2002-10-30 08:06:52 +00:00
Moriyoshi Koizumi
bce3d0cf7d Renamed the test cases. 2002-10-21 19:19:05 +00:00