mirror of
https://github.com/doctrine/orm.git
synced 2026-04-29 17:33:15 +02:00
[RFC] Smarter paginator default #6544
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @stof on GitHub (Sep 21, 2020).
The Paginator class exposes settings to configure how queries are paginated:
fetchJoinCollections, which defaults totrueuseOutputWalker, which defaults to using it if the query does not already have a custom output walker (we can only have one output walker. Not sure things actually work at all if a custom output walker is applied as it would apply on the modified query)CountWalker(using when not using the output walker) allows to control itsDISTINCTflag through a query hint storing a boolean, defaulting totrueThe existing defaults are set to maximize compatibility with queries, as they are the one producing the correct result (and supporting the query at all) for most queries. But they are also the one producing the worst queries from a performance point of view.
Note that
useOutputWalkercontrols the implementation used for getting the results of the slice whenfetchJoinCollectionsistrueand also the implementation used for getting the count.Limitations of various configs:
fetchJoinCollections = falsewill produce the wrong results (potentially very badly broken) if the query is actually fetch-joining a collection, due to the fact that an hydrated result spans multiple SQL result in such case.useOutputWalker = falsedoes not support queries usingHAVINGuseOutputWalker = falsedoes not support entities using a composite identifieruseOutputWalker = falsedoes not support entities using a foreign key as identifieruseOutputWalker = falsedoes not support queries using a field from atoManyrelation in theORDER BYclauseuseOutputWalker = falsedoes not support queries using multiple entities inFROMWe could have smarter defaults, which would try automatically choose the best behavior (keeping the ability to configure them explicitly for cases where the default cannot be smart enough, as we must still choose correctness and compatibility by default).
For
fetchJoinCollections:falsetoOnerelations, it is also safe to set this flag tofalse. But this case might be harder to identify based on the query AST.For
useOutputWalker, the cases of composite identifiers or association identifiers in entities can be detected (the walkers already validate those) and checking forHAVINGand multipleFROMis easy. Regarding the last case, it is also possible to validate it (the code does it in the tree walker currently). So we could be smarter about using a tree walker. So it would be better to try to use the tree walker as much as possible.For the
CountWalker, settingDISTINCTtofalsemight produce a broken count if there are multiple SQL results for the same hydrated result (which might happen in case of joins)In practice, the Paginator class is often used wrapped inside a higher-level API (as it is low-level on purpose) like Pagerfanta, Zend Paginator or KnpPaginator. But it can also be used directly (in a custom higher-level API) as done in https://github.com/symfony/demo. Such higher-level APIs would have 3 choices:
Applying smarter defaults in these higher-level libraries has several drawbacks:
In practice, libraries are either exposing the same defaults or changing them to a way which does not support all queries. And in most cases, this pushes the burden to projects to discover what they should do to get a good performance.
@greg0ire commented on GitHub (Sep 22, 2020):
Apparently, adding a detection for smarter defaults would be very welcome:
https://github.com/doctrine/orm/blob/107ba93d79b3e4dc9a8b0b7903245ed738a811a2/docs/en/tutorials/pagination.rst#L42-L43
@stof commented on GitHub (Sep 22, 2020):
What makes things worse is that only part of these settings are documented, making things even harder to configure properly.
@mpdude commented on GitHub (Jan 23, 2023):
Would it be possible to figure out what settings to use when we'd have the AST from the query parser at hand?
@stof commented on GitHub (Jan 24, 2023):
As described in the issue, most of those things can be figured out as the current code validates those. But nobody has taken the time to implement them until now.
@d-ph commented on GitHub (Jul 16, 2024):
I was about to create a similar RFC github issue about the default performance of the Paginator, and I'm glad I spent 10 minutes to use the "search" function to find this ticket. I reached the exact same conclusions as stof after debugging why simple cruds in my webapp take seconds to load. What's not ideal, however, is seeing that this issue remains opened with limited response from the project managers since 2020.
The following part especially is such a cheap change that it hurts seeing it not being in place already:
This alone improved some of my queries from 3.5s to 0.1s.
It took me a while to trace how the Paginator builds its queries, and to understand my options, so in case someone reads this comment and wishes to "just get stuff done" (or wishes to compare approaches on how to improve the situation while this ticket remains open), here's what I did (in my case: with Pagerfanta):
I'm looking forward to something along the above lines being implemented in the Doctrine Paginator itself, especially (and as stof already mentioned) because:
Thank you.
@greg0ire commented on GitHub (Jul 16, 2024):
Ok, I'll be more precise then: this seems like a good idea, and I would be open to reviewing PRs implementing the changes described above, or improving the documentation to include a description of the current settings, as well as merging them of course. Just make sure to send reasonably-sized PRs, with the lowest hanging fruits / less risky stuff dealt with first please.
@d-ph commented on GitHub (Jul 19, 2024):
Thanks for clarifying, and apologies if I sounded too belligerent.
I could spend some time to write up a PR for further discussion. I would be grateful however if you could look at my proposition below and give an "initial ok" for it, so that a rough direction is initially deemed acceptable.
This entire ticket is essentially (and I suspect stof would agree) about doing the following:
Proposition:
The
$fetchJoinCollectionsargument inPaginator::__construct()becomes nullable and is defaulted to null. When the Paginator is constructed with that argument set to null, an "auto-detection" is going to be carried out by the Paginator on whether the argument should be true or false. Likewise, when$useOutputWalkeris null (and the query does not use a "custom output walker"), then an auto-detection of that argument is also going to be done. In the same vain, when the query does not have aCountWalker::HINT_DISTINCTset on it, and the$useOutputWalkeris concluded to befalse, then an auto-detection of whether the hint should be set to true or false is also going to be carried out.In the first PR, the auto-detection is going to be attempted only when a QueryBuilder (instead of a Query) is supplied to the Paginator.
2.1. When an instance of a Query is supplied, and the
$fetchJoinCollectionsis set to null, then$fetchJoinCollectionsis going to be set back totrue(which is the current behaviour).If the QueryBuilder has no joins, then all the defaults mentioned at the beginning of this comment are going to be set.
I will spend 30 minutes to see whether it's JustTM possible to get the information from the QueryBuilder whether the joins are *ToMany or not. If that's easy to do, this would also be done in the first PR.
Bottom line: very simple code change. No AST interpretation and modification. Pretty much what I did in the code snippet mentioned earlier.
Thoughts?
@greg0ire commented on GitHub (Jul 19, 2024):
On paper it sounds OK, I think we might want to make this opt-in at least at first, so that people can try it and report back, then default to
nullin a next minor or evenmajorif we want to be careful.