mirror of https://github.com/php/php-src.git synced 2026-04-02 13:43:02 +02:00

Files

Alex Dowad 97f8495e0f UCS-4 conversion does not pass BOM through to output

This is to match the way that we handle UCS-2. When a BOM is found at
the beginning of a 'UCS-2' string (NOT 'UCS-2BE' or 'UCS-2LE'), we take
note of the intended byte order and handle the string accordingly, but
do NOT emit a BOM to the output. Rather, we just use the default byte
order for the requested output encoding.

Some might argue that if the input string used a BOM, and we are
emitting output in a text encoding where both big-endian and
little-endian byte orders are possible, we should include a BOM in the
output string. To such hypothetical debaters of minutiae, I can only
offer you a shoulder shrug. No reasonable program which handles UCS-2
and UCS-4 text should require a BOM.

Really, the concept of the BOM is a poor idea and should not have been
included in Unicode. Standardizing on a single byte order would have
been much better, similar to 'network byte order' for the Internet
Protocol. But this is not the place to speak at length of such things.

2021-08-30 16:29:58 +02:00

filters

UCS-4 conversion does not pass BOM through to output

2021-08-30 16:29:58 +02:00

mbfl

Output illegal character marker for 4-byte illegal characters > 0x7FFFFFFF

2021-08-30 16:29:58 +02:00

nls

Remove redundant includes from mbstring (and make sure correct config.h is used)

2020-08-31 23:17:58 +02:00

config.h.w32

Remove unused symbol definition

2019-05-11 19:47:54 +02:00

LICENSE

Integrate libmbfl docs to README.md and LICENSE

2019-05-11 18:29:30 +02:00

README.md

Integrate libmbfl docs to README.md and LICENSE

2019-05-11 18:29:30 +02:00

README.md

libmbfl

This is libmbfl, a streamable multibyte character code filter and converter library, written by Shigeru Kanemoto.

The original version of libmbfl is developed and distributed at https://github.com/moriyoshi/libmbfl under the LGPL 2.1 license. See the LICENSE file for licensing information.

The libmbfl library is bundled with PHP as a fork of the original repository and is not in sync with the upstream. As such, the libmbfl directory is directly modified in the php-src repository.

Changelog

October 2017

Since 2017, it is forked and bundled in the php-src repository. For the list of changes related to PHP see the PHP NEWS change logs.

Version 1.3.2 August 20, 2011

Added JISX-0213:2004 based encoding : Shift_JIS-2004, EUC-JP-2004, ISO-2022-JP-2004 (rui).
Added gb18030 encoding (rui).
Added CP950 with user user defined area based on Big5 (rui).
Added mapping for user defined character area to CP936 (rui).
Added UTF-8-Mobile to support the pictogram characters defined by mobile phone carrier in Japan (rui).

Version 1.3.1 August 5, 2011

Added check for invalid/obsolete utf-8 encoding (rui).

Version 1.3.0 August 1, 2011

Added encoding conversion between Shift_JIS and Unicode (6.0 or PUA) for pictogram characters defined by mobile phone carrier in Japan (rui).

Detailed info
Fixed encoding conversion of cp5022x for user defined area (rui).
Added MacJapanese (SJIS-mac) for legacy encoding support (rui).
Backport from PHP 5.2 (rui).

Version 1.1.0 March 02, 2010

Added cp5022x encoding (moriyoshi)
Added ISO-2022-JP-MS (moriyoshi)
Moved to github.com from sourceforge.jp (moriyoshi)

Earlier versions

1998/11/10 sgk implementation in C++
Rewriting with sgk C 1999/4/25.
1999/4/26 Implemented sgk input filter. Add filter while estimating kanji code.
1999/6 Unicode support.
1999/6/22 Changed sgk license to LGPL.

Credits

Marcus Boerger helly@php.net Hayk Chamyan hamshen@gmail.com Wez Furlong wez@thebrainroom.com Rui Hirokawa hirokawa@php.net Shigeru Kanemoto sgk@happysize.co.jp U. Kenkichi kenkichi@axes.co.jp Moriyoshi Koizumi moriyoshi@php.net Hironori Sato satoh@jpnnet.com Tsukada Takuya tsukada@fminn.nagano.nagano.jp Tateyama tateyan@amy.hi-ho.ne.jp Den V. Tsopa tdv@edisoft.ru Maksym Veremeyenko verem@m1stereo.tv Haluk AKIN halukakin@gmail.com