DDC-3701: Questions regarding Parser::match and "identifier" EBNF #4543

Open
opened 2026-01-22 14:44:23 +01:00 by admin · 0 comments
Owner

Originally created by @doctrinebot on GitHub (Apr 19, 2015).

Originally assigned to: @guilhermeblanco on GitHub.

Jira issue originally created by user mpdude:

I am currently investigating a problem where I am suspecting the DQL parser to wrongly transform a DQL statement to SQL. While I haven't yet found the exact reason (will open a new case for it), I have come across the following in the Parser::match() method:

   if ($lookaheadType !== $token && $token !== Lexer::T*IDENTIFIER && $lookaheadType <= Lexer::T*IDENTIFIER) {
      $this->syntaxError($this->lexer->getLiteral($token));
   }

The if-condition was changed a while ago in order to allow DQL terms like WHERE to appear as identifiers.

I think that the condition is wrong as it will *only* fail (create the syntax error) when the actual token does not match the expected type, we're not expecting an "identifier" and the next token is something special like punctuation that could not serve as an identifier.

So, for example when we're match()ing a T*WITH, a T*WHERE will be accepted as well. A correct check would probably not solve my actual problem, but it would probably have been spotted much earlier due to syntax errors issued.

IMO this should actually read

if ($lookaheadType !== $token && ($token !== Lexer::T*IDENTIFIER || $lookaheadType <= Lexer::T*IDENTIFIER)) { ... syntax error ... }

That is, fail if the token does not match the expectation; and when the expectation is T_IDENTIFIER, also accept every terminal string that can also be considered an "identifier".

With this change, the tests almost pass. The only problem is when a FROM clause expects an "identifier" and now a fully qualified class name starting with a backslash is no longer accepted.

Unfortunately, I also did not manage to find an exact definition of "identifier" somewhere in the EBNF. Lexer::getType() considers everything that starts with a character or underscore an identifier, but that does not match FQCNs.

As there seems to be no special token for backslashes, I tried allowing the backslash as a starting character for identifiers as well (in the Lexer), and it seems to work (all non-skipped tests pass).

What do you think about it? Is that something we should fix and does my change make sense?

Originally created by @doctrinebot on GitHub (Apr 19, 2015). Originally assigned to: @guilhermeblanco on GitHub. Jira issue originally created by user mpdude: I am currently investigating a problem where I am suspecting the DQL parser to wrongly transform a DQL statement to SQL. While I haven't yet found the exact reason (will open a new case for it), I have come across the following in the [Parser::match()](https://github.com/doctrine/doctrine2/blob/b055d78ea19835fab563dfc234d4804a9d04966a/lib/Doctrine/ORM/Query/Parser.php#L310) method: ``` if ($lookaheadType !== $token && $token !== Lexer::T*IDENTIFIER && $lookaheadType <= Lexer::T*IDENTIFIER) { $this->syntaxError($this->lexer->getLiteral($token)); } ``` The if-condition [was changed a while ago](https://github.com/doctrine/doctrine2/commit/a45560dbd0e865607a8b897a9c06696257dcfc5f) in order to allow DQL terms like `WHERE` to appear as identifiers. I think that the condition is wrong as it will **only\* fail (create the syntax error) when the actual token does not match the expected type, we're _not_ expecting an "identifier" *and** the next token is something special like punctuation that could not serve as an identifier. So, for example when we're `match()ing` a `T*WITH`, a `T*WHERE` will be accepted as well. A correct check would probably not solve my actual problem, but it would probably have been spotted much earlier due to syntax errors issued. IMO this should actually read ``` if ($lookaheadType !== $token && ($token !== Lexer::T*IDENTIFIER || $lookaheadType <= Lexer::T*IDENTIFIER)) { ... syntax error ... } ``` That is, fail if the token does not match the expectation; and when the expectation is `T_IDENTIFIER`, also accept every terminal string that can also be considered an "identifier". With this change, the tests almost pass. The only problem is when a `FROM` clause expects an "identifier" and now a fully qualified class name starting with a backslash is no longer accepted. Unfortunately, I also did not manage to find an exact definition of "identifier" somewhere in the EBNF. `Lexer::getType()` considers everything that starts with a character or underscore an identifier, but that does not match FQCNs. As there seems to be no special token for backslashes, I tried allowing the backslash as a starting character for identifiers as well (in the `Lexer`), and it seems to work (all non-skipped tests pass). What do you think about it? Is that something we should fix and does my change make sense?
admin added the Bug label 2026-01-22 14:44:23 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#4543