1
0
mirror of https://github.com/php/php-src.git synced 2026-03-24 00:02:20 +01:00

Fix GH-10634: Lexing memory corruption (#10866)

We're not relying on re2c's bounds checking mechanism because
re2c:yyfill:check = 0; is set. We just return 0 if we read over the end
of the input in YYFILL. Note that we used to use the "any character"
wildcard in the comment regexes.
But that means if we go over the end in the comment regexes,
we don't know that and it's just like the 0 bytes are part of the token.
Since a 0 byte already is considered as an end-of-file, we can just block
those in the regex.

For the regexes with newlines, I had to not only include \x00 in the
denylist, but also \n and \r because otherwise it would greedily match
those and let the single-line comment run over multiple lines.
This commit is contained in:
Niels Dossche
2023-03-17 17:09:14 +01:00
committed by GitHub
parent 4da0da7f2d
commit ac9964502c
2 changed files with 31 additions and 3 deletions

24
Zend/tests/gh10634.phpt Normal file
View File

@@ -0,0 +1,24 @@
--TEST--
GH-10634 (Lexing memory corruption)
--FILE--
<?php
function test_input($input) {
try {
eval($input);
} catch(Throwable $e) {
var_dump($e->getMessage());
}
}
test_input("y&/*");
test_input("y&/**");
test_input("y&#");
test_input("y&# ");
test_input("y&//");
?>
--EXPECT--
string(36) "Unterminated comment starting line 1"
string(36) "Unterminated comment starting line 1"
string(36) "syntax error, unexpected end of file"
string(36) "syntax error, unexpected end of file"
string(36) "syntax error, unexpected end of file"

View File

@@ -1369,9 +1369,13 @@ TOKENS [;:,.|^&+-/*=%!~$<>?@]
ANY_CHAR [^]
NEWLINE ("\r"|"\n"|"\r\n")
OPTIONAL_WHITESPACE [ \n\r\t]*
MULTI_LINE_COMMENT "/*"([^*]*"*"+)([^*/][^*]*"*"+)*"/"
SINGLE_LINE_COMMENT "//".*[\n\r]
HASH_COMMENT "#"(([^[].*[\n\r])|[\n\r])
/* We don't use re2c with bounds checking, we just return 0 bytes if we read past the input.
* If we use wildcard matching for comments, we can read past the input, which crashes
* once we try to report a syntax error because the 0 bytes are not actually part of
* the token. We prevent this by not allowing 0 bytes, which already aren't valid anyway. */
MULTI_LINE_COMMENT "/*"([^*\x00]*"*"+)([^*/\x00][^*\x00]*"*"+)*"/"
SINGLE_LINE_COMMENT "//"[^\x00\n\r]*[\n\r]
HASH_COMMENT "#"(([^[\x00][^\x00\n\r]*[\n\r])|[\n\r])
WHITESPACE_OR_COMMENTS ({WHITESPACE}|{MULTI_LINE_COMMENT}|{SINGLE_LINE_COMMENT}|{HASH_COMMENT})+
OPTIONAL_WHITESPACE_OR_COMMENTS ({WHITESPACE}|{MULTI_LINE_COMMENT}|{SINGLE_LINE_COMMENT}|{HASH_COMMENT})*