1
0
mirror of https://github.com/php/php-src.git synced 2026-04-18 05:21:02 +02:00
Files
archived-php-src/ext/mbstring/tests/bug69267.phpt
Nikita Popov 582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00

45 lines
1.0 KiB
PHP
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
--TEST--
Bug #69267: mb_strtolower fails on titlecase characters
--FILE--
<?php
$str_l = "džljnjdz";
$str_u = "DŽLJNJDZ";
$str_t = "DžLjNjDz";
var_dump(mb_strtolower($str_l));
var_dump(mb_strtolower($str_u));
var_dump(mb_strtolower($str_t));
var_dump(mb_strtoupper($str_l));
var_dump(mb_strtoupper($str_u));
var_dump(mb_strtoupper($str_t));
var_dump(mb_convert_case($str_l, MB_CASE_TITLE));
var_dump(mb_convert_case($str_u, MB_CASE_TITLE));
var_dump(mb_convert_case($str_t, MB_CASE_TITLE));
$str_l = "";
$str_t = "";
var_dump(mb_strtolower($str_l));
var_dump(mb_strtolower($str_t));
var_dump(mb_strtoupper($str_l));
var_dump(mb_strtoupper($str_t));
var_dump(mb_convert_case($str_l, MB_CASE_TITLE));
var_dump(mb_convert_case($str_t, MB_CASE_TITLE));
?>
--EXPECT--
string(8) "džljnjdz"
string(8) "džljnjdz"
string(8) "džljnjdz"
string(8) "DŽLJNJDZ"
string(8) "DŽLJNJDZ"
string(8) "DŽLJNJDZ"
string(8) "Džljnjdz"
string(8) "Džljnjdz"
string(8) "Džljnjdz"
string(3) "ᾳ"
string(3) "ᾳ"
string(4) "ΑΙ"
string(4) "ΑΙ"
string(3) "ᾼ"
string(3) "ᾼ"