1
0
mirror of https://github.com/php/php-src.git synced 2026-04-11 01:53:36 +02:00
Alex Dowad 7f44559516 mb_str{i,}pos does not match illegal byte sequences against occurrences of mb_substitute_char
In GitHub issue 9613, it was reported that mb_strpos wrongly matches the
character '?' against any invalid string, even when the character '?'
clearly does not appear in the invalid string. This behavior has existed
at least since PHP 5.2.

The reason for the behavior is that mb_strpos internally converts the
haystack and needle to UTF-8 before performing a search. When converting
to UTF-8, regardless of the setting of mb_substitute_character, libmbfl
would use '?' as an error marker for invalid byte sequences. Once those
invalid input sequences were replaced with '?', then naturally, they
would match against occurrences of the actual character '?' (when it
appeared as a 'normal' character, not as an error marker). This would
happen regardless of whether the error was in the haystack and '?' was
used in the needle, or whether the error was in the needle and '?' was
used in the haystack.

Why would libmbfl use '?' rather than the mb_substitute_character set
by the user? Remember that libmbfl was originally a separate library
which was imported into the PHP codebase. mb_substitute_character is an
mbstring API function, not something built into libmbfl. When mbstring
would call into libmbfl, it would provide the error replacement
character to libmbfl as a parameter. However, when libmbfl would perform
conversion operations internally, and not because of a direct call from
mbstring, it would use its own error replacement character.

Example:

    <?php
    $questionMark = "\x00?";
    $badUTF16 = "\xDB\x00"; // half of a surrogate pair
    echo mb_strpos($questionMark, $badUTF16, 0, 'UTF-16BE'), "\n";
    echo mb_strpos($badUTF16, $questionMark, 0, 'UTF-16BE'), "\n";

Incidentally, this behavior does not occur if the text encoding is
UTF-8, because no conversion is needed in that case.

mb_stripos had a similar issue, but instead of always using '?' as an
error marker internally, it would use the selected
mb_substitute_character. So, for example, if the mb_substitute_character
was '%', then occurrences of '%' in the haystack would match invalid
bytes in the needle, and vice versa.

Example:

    <?php
    mb_substitute_character(0x25); // '%'
    $percent = "\x00%";
    $badUTF16 = "\xDB\x00"; // half of a surrogate pair
    echo mb_stripos($percent, $badUTF16, 0, 'UTF-16BE'), "\n";
    echo mb_stripos($badUTF16, $percent, 0, 'UTF-16BE'), "\n";

This behavior (of mb_stripos) still occurs even if the text encoding is
UTF-8, because case folding is still needed to make the search
case-insensitive.

It is not hard to think of scenarios where these strange and unintuitive
behaviors could cause security vulnerabilities. In the discussion on
GH issue 9613, Christoph Becker suggested that mb_str{i,}pos should
simply refuse to operate on invalid strings. However, this would almost
certainly break existing production code.

This commit mitigates the problem in a less intrusive way: it ensures
that while invalid haystacks can match invalid needles (even if the
specific invalid bytes are different), invalid bytes in the haystack
will never match '?' OR occurrences of the mb_substitute_character in
the needle, and vice versa.

This does represent a backwards compatibility break, but a small one.
Since it mitigates a potential security problem, I believe this is
appropriate.

Closes GH-9613.
2022-12-18 15:31:20 +02:00
2022-12-16 17:44:26 +01:00
2022-12-16 17:44:26 +01:00
2022-12-13 19:43:47 +01:00
2022-12-16 17:44:26 +01:00
2022-12-16 17:44:26 +01:00
2022-12-16 17:44:26 +01:00
2022-08-09 16:22:14 +02:00
2019-07-21 11:40:23 +02:00
2022-12-16 17:44:26 +01:00
2022-08-30 11:17:15 -04:00
2022-03-19 18:25:29 +01:00
2022-12-16 14:45:00 +01:00
2022-11-03 14:40:35 +01:00
2022-12-16 18:14:22 +01:00
2022-12-16 18:12:28 +01:00

The PHP Interpreter

PHP is a popular general-purpose scripting language that is especially suited to web development. Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world. PHP is distributed under the PHP License v3.01.

Push Build status Build status Build Status Fuzzing Status

Documentation

The PHP manual is available at php.net/docs.

Installation

Prebuilt packages and binaries

Prebuilt packages and binaries can be used to get up and running fast with PHP.

For Windows, the PHP binaries can be obtained from windows.php.net. After extracting the archive the *.exe files are ready to use.

For other systems, see the installation chapter.

Building PHP source code

For Windows, see Build your own PHP on Windows.

For a minimal PHP build from Git, you will need autoconf, bison, and re2c. For a default build, you will additionally need libxml2 and libsqlite3.

On Ubuntu, you can install these using:

sudo apt install -y pkg-config build-essential autoconf bison re2c \
                    libxml2-dev libsqlite3-dev

On Fedora, you can install these using:

sudo dnf install re2c bison autoconf make libtool ccache libxml2-devel sqlite-devel

Generate configure:

./buildconf

Configure your build. --enable-debug is recommended for development, see ./configure --help for a full list of options.

# For development
./configure --enable-debug
# For production
./configure

Build PHP. To speed up the build, specify the maximum number of jobs using -j:

make -j4

The number of jobs should usually match the number of available cores, which can be determined using nproc.

Testing PHP source code

PHP ships with an extensive test suite, the command make test is used after successful compilation of the sources to run this test suite.

It is possible to run tests using multiple cores by setting -jN in TEST_PHP_ARGS:

make TEST_PHP_ARGS=-j4 test

Shall run make test with a maximum of 4 concurrent jobs: Generally the maximum number of jobs should not exceed the number of cores available.

The qa.php.net site provides more detailed info about testing and quality assurance.

Installing PHP built from source

After a successful build (and test), PHP may be installed with:

make install

Depending on your permissions and prefix, make install may need super user permissions.

PHP extensions

Extensions provide additional functionality on top of PHP. PHP consists of many essential bundled extensions. Additional extensions can be found in the PHP Extension Community Library - PECL.

Contributing

The PHP source code is located in the Git repository at github.com/php/php-src. Contributions are most welcome by forking the repository and sending a pull request.

Discussions are done on GitHub, but depending on the topic can also be relayed to the official PHP developer mailing list internals@lists.php.net.

New features require an RFC and must be accepted by the developers. See Request for comments - RFC and Voting on PHP features for more information on the process.

Bug fixes don't require an RFC. If the bug has a GitHub issue, reference it in the commit message using GH-NNNNNN. Use #NNNNNN for tickets in the old bugs.php.net bug tracker.

Fix GH-7815: php_uname doesn't recognise latest Windows versions
Fix #55371: get_magic_quotes_gpc() throws deprecation warning

See Git workflow for details on how pull requests are merged.

Guidelines for contributors

See further documents in the repository for more information on how to contribute:

Credits

For the list of people who've put work into PHP, please see the PHP credits page.

Description
⚠️ ARCHIVED: Original GitHub repository no longer exists. Preserved as backup on 2026-01-22T16:25:23.756Z
Readme 1,009 MiB
Languages
C 66%
PHP 31.3%
C++ 0.8%
Shell 0.5%
M4 0.4%
Other 0.8%