1
0
mirror of https://github.com/php/php-src.git synced 2026-04-02 05:32:28 +02:00
Files
archived-php-src/ext/mbstring/libmbfl
Alex Dowad d9ddeb6e85 UTF-16 text conversion handles truncated characters as illegal
This broke one old test (Zend/tests/multibyte_encoding_003.phpt), which used
a PHP script encoded as UTF-16. The problem was that to terminate the test
script, we need the text: "\n--EXPECT--". Out of that text, the terminating
newline (0x0A byte) becomes part of the resulting test script; but a bare
0x0A byte with no 0x00 is not valid UTF-16.

Since we now treat truncated UTF-16 characters as erroneous, an extra '?' is
appended to the output as an 'illegal character' marker.

Really, if we are running PHP scripts which are treated as encoded in UTF-16
or some other arbitrary text encoding (not ASCII), and the script is not
actually a valid string in that encoding, inserting '?' characters into the
code which the PHP interpreter runs is a bad thing to do. In such cases, the
script shouldn't be treated as UTF-16 (or whatever) at all.

I wonder if mbstring's encoding detection is being used in 'non-strict' mode?
2020-10-27 10:19:00 +02:00
..
2019-05-11 19:47:54 +02:00

libmbfl

This is libmbfl, a streamable multibyte character code filter and converter library, written by Shigeru Kanemoto.

The original version of libmbfl is developed and distributed at https://github.com/moriyoshi/libmbfl under the LGPL 2.1 license. See the LICENSE file for licensing information.

The libmbfl library is bundled with PHP as a fork of the original repository and is not in sync with the upstream. As such, the libmbfl directory is directly modified in the php-src repository.

Changelog

October 2017

  • Since 2017, it is forked and bundled in the php-src repository. For the list of changes related to PHP see the PHP NEWS change logs.

Version 1.3.2 August 20, 2011

  • Added JISX-0213:2004 based encoding : Shift_JIS-2004, EUC-JP-2004, ISO-2022-JP-2004 (rui).
  • Added gb18030 encoding (rui).
  • Added CP950 with user user defined area based on Big5 (rui).
  • Added mapping for user defined character area to CP936 (rui).
  • Added UTF-8-Mobile to support the pictogram characters defined by mobile phone carrier in Japan (rui).

Version 1.3.1 August 5, 2011

  • Added check for invalid/obsolete utf-8 encoding (rui).

Version 1.3.0 August 1, 2011

  • Added encoding conversion between Shift_JIS and Unicode (6.0 or PUA) for pictogram characters defined by mobile phone carrier in Japan (rui).

    Detailed info

  • Fixed encoding conversion of cp5022x for user defined area (rui).

  • Added MacJapanese (SJIS-mac) for legacy encoding support (rui).

  • Backport from PHP 5.2 (rui).

Version 1.1.0 March 02, 2010

  • Added cp5022x encoding (moriyoshi)
  • Added ISO-2022-JP-MS (moriyoshi)
  • Moved to github.com from sourceforge.jp (moriyoshi)

Earlier versions

  • 1998/11/10 sgk implementation in C++
  • Rewriting with sgk C 1999/4/25.
  • 1999/4/26 Implemented sgk input filter. Add filter while estimating kanji code.
  • 1999/6 Unicode support.
  • 1999/6/22 Changed sgk license to LGPL.

Credits

Marcus Boerger helly@php.net Hayk Chamyan hamshen@gmail.com Wez Furlong wez@thebrainroom.com Rui Hirokawa hirokawa@php.net Shigeru Kanemoto sgk@happysize.co.jp U. Kenkichi kenkichi@axes.co.jp Moriyoshi Koizumi moriyoshi@php.net Hironori Sato satoh@jpnnet.com Tsukada Takuya tsukada@fminn.nagano.nagano.jp Tateyama tateyan@amy.hi-ho.ne.jp Den V. Tsopa tdv@edisoft.ru Maksym Veremeyenko verem@m1stereo.tv Haluk AKIN halukakin@gmail.com