[PR #11188] Fix different first/max result values taking up query cache space #12844

Closed
opened 2026-01-22 16:15:21 +01:00 by admin · 0 comments
Owner

Original Pull Request: https://github.com/doctrine/orm/pull/11188

State: closed
Merged: Yes


#11112 reported that passing different values to the Query::setFirstResult() and Query::setMaxResults() methods leads to the queries using different keys in the query cache that speeds up DQL -> SQL transformation. This is no surprise, since these values are explicitly included in the cache key computation:

f58984c43f/lib/Doctrine/ORM/Query.php (L819)

The two methods in question may also be used at a lower level when using the Paginator component, which may forward its limit/offset values to these methods in one way or another.

Since in web applications the limit/offset (or a "page" numer that effectively gives the offset) is often controlled through a request parameter, this may lead to different kinds of memory or storage exhaustion problems.

Reason for having plain values in SQL

The reason why these values are part of the cache key is that both values need to be put into the resulting SQL statement literally, not through parameters. This has to do with how DBAL needs to generate platform-specific SQL, and it cannot be fixed. The corresponding DBAL discussion was at https://github.com/doctrine/dbal/issues/6246.

Approach taken by this PR

Applying the limit/offset to a given SQL query is something that DBAL does on a string-processing level. Luckily, it also happens very late during SQL statement preparation. Only extra SQL expressions added for locking modes need to be added after it.

This gives us the possibility to split DQL -> SQL transformation into two stages. The first one includes building the AST, tree processing and output walking. But, it stops short of adding the limit/offset values to the query or adding locking statements.

We can then put this intermediate, "unfinalized" result into the DQL->SQL query cache, and we can take it from there also for other queries that only differ in the first/max result values. Then, we run the "finalizer" phase to append offset/limit and locking SQL to the prepared query, based on the actual query at hand.

That way, the first/max values no longer need to be part of the cache key.

Concepts and code paths added

The new \Doctrine\ORM\Query\OutputWalker interface expresses how the Parser would like to interact with the (custom) output walker. Its getFinalizer() method returns an instance of \Doctrine\ORM\Query\Exec\SqlFinalizer. This will eventually replace the getExecutor() method that is deprecated in this PR and that will be removed in 4.0.

The SqlFinalizer must be a serializable class, since it will be kept in the query cache as part of the ParserResult. Based on the current Query, the SqlFinalizer then ultimately creates the AbstractSqlExecutor. At this time, it has the opportunity to add the SQL limit/offset and locking parts.

The Query passed to the finalizer is the one that caused the cache lookup, not necessarily the same as the one that was used to compile the DQL. However, since the DQL is part of the cache key computation, both queries must be based on the same DQL, hints etc.

The Query has to decide whether to put offset/limit into the cache key computation or not. It does that by inspecting the output walker being hinted to, with the slight optimization of knowing the default value. This leaks details of the Parser implementation, but could be removed once we make OutputWalker the only choice.

BC/migration path

We cannot change the SqlWalker to return the SQL without the limit/offset parts for BC reasons.

Also, our documentation is showing users how to build their custom output walkers by subclassing this class. So, we cannot make the SqlWalker implement OutputWalker, since we could not tell whether the subclasses are aware of the necessary changes and have updated their SQL generation accordingly.

The solution is to have a new subclass SqlOutputWalker. By moving to and extending this new class, users indicate that they are aware of the necessary changes and have removed the limit/offset parts from their own SQL generation. Of course, users would be free to return their own SqlFinalizer implementations in case they need special offset/limit handling code.

Slight BC break considered irrelevant

Using the Parser::parse() method may now produce a ParserResult that no longer contains the AbstractSqlExecutor, but only the SqlFinalizer. This has been discussed internally with the conclusion that the Parser and its parse() method are not part of an official API, but "implicitly internal". They are part of the inner workings of the Query class, which is able to deal with this variation.

**Original Pull Request:** https://github.com/doctrine/orm/pull/11188 **State:** closed **Merged:** Yes --- #11112 reported that passing different values to the `Query::setFirstResult()` and `Query::setMaxResults()` methods leads to the queries using different keys in the query cache that speeds up DQL -> SQL transformation. This is no surprise, since these values are explicitly included in the cache key computation: https://github.com/doctrine/orm/blob/f58984c43f65ac9391776e26341c753895ff7c00/lib/Doctrine/ORM/Query.php#L819 The two methods in question may also be used at a lower level when using the `Paginator` component, which may forward its limit/offset values to these methods in one way or another. Since in web applications the limit/offset (or a "page" numer that effectively gives the offset) is often controlled through a request parameter, this may lead to different kinds of memory or storage exhaustion problems. #### Reason for having plain values in SQL The reason why these values are part of the cache key is that both values need to be put into the resulting SQL statement literally, not through parameters. This has to do with how DBAL needs to generate platform-specific SQL, and it cannot be fixed. The corresponding DBAL discussion was at https://github.com/doctrine/dbal/issues/6246. #### Approach taken by this PR Applying the limit/offset to a given SQL query is something that DBAL does on a string-processing level. Luckily, it also happens very late during SQL statement preparation. Only extra SQL expressions added for locking modes need to be added after it. This gives us the possibility to split DQL -> SQL transformation into two stages. The first one includes building the AST, tree processing and output walking. But, it stops short of adding the limit/offset values to the query or adding locking statements. We can then put this intermediate, "unfinalized" result into the DQL->SQL query cache, and we can take it from there also for other queries that only differ in the first/max result values. Then, we run the "finalizer" phase to append offset/limit and locking SQL to the prepared query, based on the actual query at hand. That way, the first/max values no longer need to be part of the cache key. #### Concepts and code paths added The new `\Doctrine\ORM\Query\OutputWalker` interface expresses how the `Parser` would like to interact with the (custom) output walker. Its `getFinalizer()` method returns an instance of `\Doctrine\ORM\Query\Exec\SqlFinalizer`. This will eventually replace the `getExecutor()` method that is deprecated in this PR and that will be removed in 4.0. The `SqlFinalizer` must be a serializable class, since it will be kept in the query cache as part of the `ParserResult`. Based on the current `Query`, the `SqlFinalizer` then ultimately creates the `AbstractSqlExecutor`. At this time, it has the opportunity to add the SQL limit/offset and locking parts. The `Query` passed to the finalizer is the one that caused the cache lookup, not necessarily the same as the one that was used to compile the DQL. However, since the DQL is part of the cache key computation, both queries must be based on the same DQL, hints etc. The `Query` has to decide whether to put offset/limit into the cache key computation or not. It does that by inspecting the output walker being hinted to, with the slight optimization of knowing the default value. This leaks details of the `Parser` implementation, but could be removed once we make `OutputWalker` the only choice. #### BC/migration path We cannot change the `SqlWalker` to return the SQL without the limit/offset parts for BC reasons. Also, our documentation is showing users how to build their custom output walkers by subclassing this class. So, we cannot make the `SqlWalker` implement `OutputWalker`, since we could not tell whether the subclasses are aware of the necessary changes and have updated their SQL generation accordingly. The solution is to have a new subclass `SqlOutputWalker`. By moving to and extending this new class, users indicate that they are aware of the necessary changes and have removed the limit/offset parts from their own SQL generation. Of course, users would be free to return their own `SqlFinalizer` implementations in case they need special offset/limit handling code. #### Slight BC break considered irrelevant Using the `Parser::parse()` method may now produce a `ParserResult` that no longer contains the `AbstractSqlExecutor`, but only the `SqlFinalizer`. This has been discussed internally with the conclusion that the `Parser` and its `parse()` method are not part of an official API, but "implicitly internal". They are part of the inner workings of the `Query` class, which is able to deal with this variation.
admin added the pull-request label 2026-01-22 16:15:21 +01:00
admin closed this issue 2026-01-22 16:15:22 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#12844