1
0
mirror of https://github.com/php/php-src.git synced 2026-04-22 23:48:14 +02:00
Commit Graph

50 Commits

Author SHA1 Message Date
Nikita Popov 9d63f4dec1 Fixed bug #76319
While at it, also make sure that mbstring case conversion takes
into account the specified substitution character and substitution
mode.
2018-05-25 11:33:13 +02:00
Xinchen Hui a6519d0514 year++ 2018-01-02 12:57:58 +08:00
Anatol Belski f9c3ee9ae8 fix c89 compat 2017-07-28 22:18:51 +02:00
Nikita Popov f4a1d9c821 Fixed bug #65544 and #71298 2017-07-28 14:57:08 +02:00
Nikita Popov 582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00
Nikita Popov 9ac7c1e71d Use case-folding for case insensitive comparisons
Instead of using lowercasing.
2017-07-28 12:32:50 +02:00
Nikita Popov 80a0601fe5 Use MPH for case maps
Instead of performing a binary search, use a hashtable to store
the case maps. In particular a minimal perfect hash construction
is used, which does not require collision resolution (but does
use an auxiliary table for the hash perturbation).
2017-07-28 12:32:50 +02:00
Nikita Popov 3c6b2512cb Change layout of case mapping table
Previously the case mapping table was segregated by the type of
the character (upper, lower, title) and always stored the other
two variants (key, other1, other2). Now the table is segregated
by the target type (key, other). As only very few characters have
more than one target this only slightly increases the size of the
table.

The advantage of this layout is that we only need to perform a
single table lookup in the case table. Previously, depending on
the case that was hit, either one lookup in the property table,
or two lookups in the property table and one lookup in the case
table were required.

This changes the layout from libunicode in the OpenLDAP project
-- however, the last commit there was over 10 years ago, so I
don't see value in keeping this in sync.
2017-07-23 18:33:15 +02:00
Nikita Popov 7077c719db Merge branch 'PHP-7.2' 2017-07-23 15:36:25 +02:00
Nikita Popov c0bcd301d3 Another fix for bug #69267
mb_strtoupper() was converting lowercase characters into
titlecase characters, instead of uppercase characters. Luckily
there are only very few characters with a distinct titlecase
representation, so this mostly worked out okay...
2017-07-23 15:07:02 +02:00
Nikita Popov 0e4af9192f Partial fix for bug #69267
This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the
OpenLDAP project.
2017-07-23 14:47:21 +02:00
Nikita Popov b3c1d9d111 Directly use encodings instead of no_encoding in libmbfl
In particular strings now store encoding rather than the
no_encoding.

I've also pruned out libmbfl APIs that existed in two forms, one
using no_encoding and the other using encoding. We were not actually
using any of the former.
2017-07-20 21:41:52 +02:00
Nikita Popov c098304e17 Reduce number of encoding conversions in case conversion
Don't indirect through UCS4BE, instead directly work on wchars
using a custom filter.

This replaces the pipeline
  utf8 -> wchar -> ucs4be -> wchar -case-> wchar -> ucs4be -> wchar -> utf8
with
  utf8 -> wchar -case-> -> wchar -> utf8
2017-07-20 15:33:24 +02:00
Nikita Popov 17da862b51 Optimize php_unicode_tolower/upper for ASCII 2017-07-20 13:58:40 +02:00
Nikita Popov 9c73be898d Directly accept encoding in php_unicode_convert_case()
As a side-effect mb_strtolower() and mb_strtoupper() now correctly
handle a NULL encoding parameter by using the internal encoding.
This is what caused the two test changes.
2017-07-19 23:59:42 +02:00
Nikita Popov 4cf22cbb2d Optimize php_unicode_is_prop()
Do not try to extract the properties from a bitmask. Instead make
the function variadic and pass all properties individually.

Also add a php_unicode_is_prop1() function to check only a single
property.
2017-07-19 23:59:42 +02:00
Nikita Popov dead4f0b1b Avoid unnecessary encoding lookups in mbstring
Extract part of php_mb_convert_encoding that does the actual work
and use it whenever we already know the encoding.
2017-07-19 23:59:42 +02:00
Sammy Kaye Powers 9e29f841ce Update copyright headers to 2017 2017-01-02 09:30:12 -06:00
Lior Kaplan ed35de784f Merge branch 'PHP-5.6' into PHP-7.0
* PHP-5.6:
  Happy new year (Update copyright to 2016)
2016-01-01 19:48:25 +02:00
Lior Kaplan 49493a2dcf Happy new year (Update copyright to 2016) 2016-01-01 19:21:47 +02:00
Xinchen Hui fc33f52d8c bump year 2015-01-15 23:27:30 +08:00
Xinchen Hui 0579e8278d bump year 2015-01-15 23:26:37 +08:00
Stanislav Malyshev b7a7b1a624 trailing whitespace removal 2015-01-10 15:07:38 -08:00
Anatol Belski bdeb220f48 first shot remove TSRMLS_* things 2014-12-13 23:06:14 +01:00
Johannes Schlüter d0cb715373 s/PHP 5/PHP 7/ 2014-09-19 18:33:14 +02:00
Xinchen Hui c081ce628f Bump year 2014-01-03 11:08:10 +08:00
Xinchen Hui a666285bc2 Happy New Year 2013-01-01 16:37:09 +08:00
Felipe Pena 8775a37559 - Year++ 2012-01-01 13:15:04 +00:00
Felipe Pena 0203cc3d44 - Year++ 2011-01-01 02:17:06 +00:00
Sebastian Bergmann 9ba1e81665 sed -i "s#1997-2009#1997-2010#g" **/*.c **/*.h **/*.php 2010-01-03 09:23:27 +00:00
Sebastian Bergmann 08659c2dcd MFH: Bump copyright year, 3 of 3. 2008-12-31 11:15:49 +00:00
Ilia Alshanetsky 24e7e62307 Fixed bug #46626 (mb_convert_case does not handle apostrophe correctly) 2008-11-24 21:23:03 +00:00
Moriyoshi Koizumi d7594edaa0 - MFH: Fixed warnings. 2008-07-24 13:46:50 +00:00
Rui Hirokawa bedb308902 fixed #43998 Two error messages returned for incorrect encoding for mb_strto[upper|lower] 2008-02-16 12:01:43 +00:00
Sebastian Bergmann d1dded8751 MFH: Bump copyright year, 2 of 2. 2007-12-31 07:17:19 +00:00
Rui Hirokawa b9424cdfaf MFH: fixed bug #29955 invalid case conversion in iso-8859-9. 2007-09-04 14:14:11 +00:00
Sebastian Bergmann 4223aa4d5e MFH: Bump year. 2007-01-01 09:36:18 +00:00
foobar 5bd93221a8 bump year and license version 2006-01-01 12:51:34 +00:00
Rui Hirokawa 4628a41c00 MFH: fixed #29955 mb_strtoupper() / lower() broken with Turkish encoding.. 2005-12-23 15:18:52 +00:00
foobar 23e671a51e - Bumber up year 2005-08-03 14:08:58 +00:00
Andi Gutmans dbeb4158d2 - A belated happy holidays and PHP 5 2004-01-08 08:18:22 +00:00
James Cox f68c7ff249 updating license information in the headers. 2003-06-10 20:04:29 +00:00
Sebastian Bergmann b506f5c8f8 Bump year. 2002-12-31 16:08:15 +00:00
Moriyoshi Koizumi de79a4e9d8 Reverted the changes because the problem was elsewhere. 2002-12-02 21:10:37 +00:00
Frank M. Kromann a7f3ad42a4 Fixing build on WIn32
MBREGEX is disabled for now. 5 mbre_* functions are undefined on WIn32
2002-12-02 18:19:17 +00:00
Edin Kadribasic 1eddce79dd MFB (made mbstring compile on windows again). 2002-11-13 23:11:14 +00:00
Moriyoshi Koizumi 67e6c356f6 Fixed mb_convert_case() / mb_strtolower() / mb_strtoupper() to work in
64bit systems
2002-11-11 02:39:32 +00:00
Moriyoshi Koizumi 3bc01b5d0b Modified mb_convert_case() to handle cased characters properly when MB_CASE_TITLE is specified. 2002-10-23 20:32:51 +00:00
Zeev Suraski 2c4b6fff6d Fix warnings 2002-10-01 10:16:40 +00:00
Wez Furlong 1a87c6b5bf (PHP mb_convert_case) Add function that will convert the case of a string
Respecting it's encoding (or the internal encoding).
2002-09-26 00:53:47 +00:00