1
0
mirror of https://github.com/php/php-src.git synced 2026-04-24 00:18:23 +02:00
Commit Graph

1383 Commits

Author SHA1 Message Date
Nikita Popov d21c902841 Fix cp950 pua check
One set of parenthesis was missing, causing a legitimate compiler
warnings. In the end it doesn't actually matter, because it just
ends up doing an unnecessary check in the w > 0 case.

This fixes the logic and moves it out into a separate functions,
to be a bit more readable.
2017-11-22 23:47:18 +01:00
Colin O'Dell 201930106d Add test for negative lengths in mb_strcut() 2017-11-22 22:47:55 +01:00
Colin O'Dell 830d87b86e Add tests for mb_language() 2017-11-22 22:47:55 +01:00
Joe Watkins 21e4ab1977 Merge branch 'PHP-7.2'
* PHP-7.2:
  Fix proto documents for new global functions
2017-11-06 07:24:51 +00:00
Tyson Andre 5cdf37e603 Fix proto documents for new global functions
See NEWS and UPGRADING (or arginfo/implementation) for details.
2017-11-06 07:24:42 +00:00
Dmitry Stogov 3b2e858304 Overlad functions once in MINIT (instead of on each requestr in RINIT) 2017-11-02 14:09:06 +03:00
Dmitry Stogov ed5b4d5c99 Use Zend MM heap 2017-11-01 02:38:26 +03:00
Nikita Popov 251c1b1a44 Fix invalid read in mb_ord() 2017-10-28 16:44:32 +02:00
Peter Kokot 5c5bd30339 Remove --with-libmbfl configure option
The bundled libmbfl library is no longer API or ABI compatible with
the (currently unmaintained) upstream library. As such, building
against an external libmbfl is no longer possible.
2017-10-28 16:11:30 +02:00
Dmitry Stogov 9cf87aa196 Avoid HashTable allocations for empty arrays (using zend_empty_array). 2017-10-24 17:27:31 +03:00
Peter Kokot 3ed3bc3a0c Update README information for the libmbfl library
The libmbfl library is bundled with PHP and has its own repository for
development and bug fixes. To avoid confusion and faster development the
README has been updated to include the information of the original library and
to use the bundled library as a fork of the upstream repository instead.
2017-10-08 17:51:02 +02:00
Peter Kokot a57de26c3d Refactor mbstring READMEs 2017-10-08 17:51:02 +02:00
Dmitry Stogov 45ee78e040 mb_convert_variables() refactored to use simple recursion.
Fixed incorrect recursion protection (previous implementation kept protection flag or apply counter in non-zero state).
2017-10-06 12:08:55 +03:00
Dmitry Stogov cb9d81ef4f Refactored recursion pretection 2017-10-06 01:34:50 +03:00
Peter Kokot 39ea632f74 Join untracked files to root .gitignore 2017-10-05 12:36:47 +02:00
Dmitry Stogov 44e0b79ac6 Refactored array creation API. array_init() and array_init_size() are converted into macros calling zend_new_array(). They are not functions anymore and don't return any values. 2017-09-20 02:25:56 +03:00
Joe Watkins c898349e16 fixes PR #2722, no clue how it broke ... 2017-09-06 11:13:27 +01:00
shinemotec@gmail.com 9b77615608 fixed mbstring extension compiled broken with archlinux 2017-09-06 09:50:08 +01:00
Nikita Popov fea7957d08 Optimize mb_chr()
By avoiding an unnecessary copy between a string an zend_string.
2017-08-04 22:38:54 +02:00
Nikita Popov f24db7686e Optimize mb_ord()
Don't perform a full encoding conversion into UCS4-BE, instead only
perform an input conversion into a wchar device.
2017-08-04 22:22:58 +02:00
Nikita Popov 633a471ba0 Store input and output filters in mbfl encodings
For functions like mb_chr() and mb_ord() just looking up the
input/output filter for the encoding dominates the runtime. This
commit stores the input/output filter for an encoding in the
mbfl encoding structure, so it can be looked up directly, rather
than scanning through filter function lists.
2017-08-04 22:22:58 +02:00
Nikita Popov e20fbd43ba Separate mbfl filters into three categories
Input filters, output filters and special filters.
2017-08-04 22:22:58 +02:00
Nikita Popov 840b77c02e Merge branch 'PHP-7.2' 2017-08-04 22:20:11 +02:00
Nikita Popov 6b73b2d6eb Check for empty string in mb_ord() 2017-08-04 22:20:05 +02:00
Nikita Popov 4e4ec31e2e Merge branch 'PHP-7.2' 2017-08-04 13:02:44 +02:00
Nikita Popov 353f7bf461 Also check for invalid codepoints in mb_ord()
And return false in that case, instead of returning 0x3f...
2017-08-04 13:01:03 +02:00
Nikita Popov 5caf05f6c5 Merge branch 'PHP-7.2' 2017-08-03 22:41:15 +02:00
Nikita Popov e53162a32b Return false on invalid codepoint in mb_chr()
Instead of returning the encoding of the current substitution
character. This allows a robust check for the failure case. The
substitution character (especially the default of "?") is also
a valid output of mb_chr() for a valid input (for "?" that would be
0x3f), so it's a bad choice for an error value.
2017-08-03 22:36:42 +02:00
Nikita Popov 41e9ba6333 Always use Unicode codepoints in mb_ord() and mb_chr()
Previously mb_chr() had two different encoding-dependent behaviors:
 * For "Unicode-encodings" it took a Unicode codepoint and returned
   its encoded representation.
 * Otherwise it returned a big-endian binary encoding of the passed
   integer.

Now the input is always interpreted as a Unicode codepoint. If
a big-endian binary encoding is what you want, you don't need
mbstring to implement that.
2017-08-03 22:14:00 +02:00
Nikita Popov c98714f19e Merge branch 'PHP-7.2' 2017-08-03 21:57:35 +02:00
Nikita Popov fb9bf5b64b Revert/fix substitution character fallback
The introduced checks were not correct in two respects:
 * It was checked whether the source encoding of the string matches
   the internal encoding, while the actually relevant encoding is
   the *target* encoding.
 * Even if the correct encoding is used, the checks are still too
   conservative. Just because something is not a "Unicode-encoding"
   does not mean that it does not map any non-ASCII characters.

I've reverted the added checks and instead adjusted mbfl_convert
to first try to use the provided substitution character and if
that fails, perform the fallback to '?' at that point. This means
that any codepoint mapped in the target encoding should now be
correctly supported and anything else should fall back to '?'.
2017-08-03 21:53:59 +02:00
Nikita Popov 3d948d77d1 Merge branch 'PHP-7.2' 2017-08-03 21:17:26 +02:00
Nikita Popov a8a9e93e9a Revert/fix mb_substitute_character() codepoint checks
The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.
2017-08-03 21:12:41 +02:00
Nikita Popov 94fe629992 Merge branch 'PHP-7.2' 2017-08-02 18:11:17 +02:00
Nikita Popov 91240073ea Merge branch 'PHP-7.1' into PHP-7.2 2017-08-02 18:11:12 +02:00
Nikita Popov 63607375f5 Merge branch 'PHP-7.0' into PHP-7.1 2017-08-02 18:09:09 +02:00
Fabien Villepinte 2cc1cbf2f4 Fix Bug #75001: Wrong reflection on mb_eregi_replace 2017-08-02 18:08:42 +02:00
Anatol Belski f9c3ee9ae8 fix c89 compat 2017-07-28 22:18:51 +02:00
Nikita Popov f4a1d9c821 Fixed bug #65544 and #71298 2017-07-28 14:57:08 +02:00
Nikita Popov 25b6e68432 Merge branch 'PHP-7.2' 2017-07-28 13:03:35 +02:00
Nikita Popov 5d777e56e2 Merge branch 'PHP-7.1' into PHP-7.2 2017-07-28 13:03:26 +02:00
Nikita Popov c48c638aeb Merge branch 'PHP-7.0' into PHP-7.1 2017-07-28 13:03:02 +02:00
Nikita Popov e3d25e78eb Fixed bug #62934 2017-07-28 13:02:25 +02:00
Nikita Popov 582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00
Nikita Popov 9ac7c1e71d Use case-folding for case insensitive comparisons
Instead of using lowercasing.
2017-07-28 12:32:50 +02:00
Nikita Popov 80a0601fe5 Use MPH for case maps
Instead of performing a binary search, use a hashtable to store
the case maps. In particular a minimal perfect hash construction
is used, which does not require collision resolution (but does
use an auxiliary table for the hash perturbation).
2017-07-28 12:32:50 +02:00
Nikita Popov f56b0afe6e Avoid some unnecessary mbfl_strlen() calculations 2017-07-28 12:32:50 +02:00
Nikita Popov eacd70f762 Don't store titlecase if same as uppercase
The totitle code already has a fallback for that case.
2017-07-28 12:32:50 +02:00
Nikita Popov cedfc2f426 Drop implementation-specific character properties
No point in keeping around non-standard character properties if
we're not using them and most are not even being populated.
2017-07-28 12:32:50 +02:00
Anatol Belski 98fe82cc05 fix data types 2017-07-25 21:26:25 +02:00