Avoid Using unqualifed COUNT in EntityPersister #7163

Closed
opened 2026-01-22 15:45:53 +01:00 by admin · 11 comments
Owner

Originally created by @Aweptimum on GitHub (Jun 6, 2023).

Feature Request

Q A
New Feature yes
RFC yes
BC Break no

Summary

Might be a dupe of #5960

Using Symfony. I switched from using findBy to matching in my services. I then checked the profiler for an unrelated feature I was working on and noticed Doctrine was issuing a COUNT (*) query. That's a bit concerning for larger tables, so I dug into the code and traced it to getCountSql

3827dd769e/lib/Doctrine/ORM/Persisters/Entity/BasicEntityPersister.php (L1136-L1156)

If the $criteria is a Criteria object and doesn't have any expressions, this will issue a count against the entire table. I think to help avoid this, it should check if maxResults is set like so (rough example):

$sql =  'SELECT COUNT(*) '
        . 'FROM ' . $tableName . ' ' . $tableAlias
        . (empty($conditionSql) ? '' : ' WHERE ' . $conditionSql);

if ($criteria instanceof Criteria and null !== $criteria->getMaxResults()) {
    $sql .= ' LIMIT ' . $criteria->getMaxResults();
}

return $sql;

I don't know if that check could go in one of the helper methods used in here, but it would save some io when the actual result set is bigger than the given limit.

Originally created by @Aweptimum on GitHub (Jun 6, 2023). ### Feature Request <!-- Fill in the relevant information below to help triage your issue. --> | Q | A |------------ | ------ | New Feature | yes | RFC | yes | BC Break | no #### Summary <!-- Provide a summary of the feature you would like to see implemented. --> Might be a dupe of #5960 Using Symfony. I switched from using `findBy` to `matching` in my services. I then checked the profiler for an unrelated feature I was working on and noticed Doctrine was issuing a `COUNT (*)` query. That's a bit concerning for larger tables, so I dug into the code and traced it to `getCountSql` https://github.com/doctrine/orm/blob/3827dd769edf25d59fa29a7d63efd6ba15b3a789/lib/Doctrine/ORM/Persisters/Entity/BasicEntityPersister.php#L1136-L1156 If the $criteria is a Criteria object and doesn't have any expressions, this will issue a count against the entire table. I think to help avoid this, it should check if maxResults is set like so (rough example): ```php $sql = 'SELECT COUNT(*) ' . 'FROM ' . $tableName . ' ' . $tableAlias . (empty($conditionSql) ? '' : ' WHERE ' . $conditionSql); if ($criteria instanceof Criteria and null !== $criteria->getMaxResults()) { $sql .= ' LIMIT ' . $criteria->getMaxResults(); } return $sql; ``` I don't know if that check could go in one of the helper methods used in here, but it would save some io when the actual result set is bigger than the given limit.
admin closed this issue 2026-01-22 15:45:53 +01:00
Author
Owner

@greg0ire commented on GitHub (Jun 6, 2023):

it should check if maxResults is set like so

Does it break any unit tests where you try that? Maybe that will help us understand why it's not as you describe in the first place.

@greg0ire commented on GitHub (Jun 6, 2023): > it should check if maxResults is set like so Does it break any unit tests where you try that? Maybe that will help us understand why it's not as you describe in the first place.
Author
Owner

@Aweptimum commented on GitHub (Jun 6, 2023):

I just made a fork, cloned it + ran composer install, and ran vendor/bin/phpunit on the 2.15 branch. Here's the result:
Tests: 3746, Assertions: 7031, Errors: 1452, Failures: 11, Skipped: 102, Incomplete: 2.

Then made my change:
Tests: 3746, Assertions: 7031, Errors: 1452, Failures: 11, Skipped: 102, Incomplete: 2.

It's identical, but what is causing all these errors .-. oops forgot to enable sqlite drivers in .ini

@Aweptimum commented on GitHub (Jun 6, 2023): I just made a fork, cloned it + ran composer install, and ran vendor/bin/phpunit on the 2.15 branch. Here's the result: Tests: 3746, Assertions: 7031, Errors: 1452, Failures: 11, Skipped: 102, Incomplete: 2. Then made my change: Tests: 3746, Assertions: 7031, Errors: 1452, Failures: 11, Skipped: 102, Incomplete: 2. ~~It's identical, but what is causing all these errors .-.~~ oops forgot to enable sqlite drivers in .ini
Author
Owner

@Aweptimum commented on GitHub (Jun 6, 2023):

No the change doesn't break anything
Without change:
OK, but incomplete, skipped, or risky tests!
Tests: 3735, Assertions: 13316, Skipped: 73, Incomplete: 2.

With change:
OK, but incomplete, skipped, or risky tests!
Tests: 3735, Assertions: 13316, Skipped: 73, Incomplete: 2.

@Aweptimum commented on GitHub (Jun 6, 2023): No the change doesn't break anything Without change: OK, but incomplete, skipped, or risky tests! Tests: 3735, Assertions: 13316, Skipped: 73, Incomplete: 2. With change: OK, but incomplete, skipped, or risky tests! Tests: 3735, Assertions: 13316, Skipped: 73, Incomplete: 2.
Author
Owner

@greg0ire commented on GitHub (Jun 6, 2023):

Ok, so it's not deliberately like that, so maybe we are not missing anything. Consider sending a pull requests with tests then.

@greg0ire commented on GitHub (Jun 6, 2023): Ok, so it's not deliberately like that, so maybe we are not missing anything. Consider sending a pull requests with tests then.
Author
Owner

@Aweptimum commented on GitHub (Jun 6, 2023):

Done, if you want me to make any changes let me know :)

@Aweptimum commented on GitHub (Jun 6, 2023): Done, if you want me to make any changes let me know :)
Author
Owner

@greg0ire commented on GitHub (Jun 7, 2023):

Let's backtrack a bit. What calls getCountSql and why?

@greg0ire commented on GitHub (Jun 7, 2023): Let's backtrack a bit. What calls `getCountSql` and why?
Author
Owner

@Aweptimum commented on GitHub (Jun 7, 2023):

Oh, yes, I probably should have elaborated on that more. the repository's matching method returns an instance of AbstractLazyCollection (LazyCriteriaCollection in this case). I was checking if the collection was empty, using isEmpty, and thought that it might have something to do with that. Looking through most of the methods in AbstractLazyCollection, they all check if the collection is in an initialized state first. If not, they call initialize, which calls the abstract doInitialize. In LazyCriteriaCollection, doInitialize asks its entityPersister to load the collection.

However, the overridden methods in LazyCriteriaCollection don't call initialize, they instead check if the collection is in an initialized state and will grab a cached value if so. If not, they delegate to the entity persister which retrieves the value through a query. isEmpty just delegates to count, and the problem is that it's one such overridden method and ends up calling getCountSql if it isn't in an initialized state. count calls entityPersister->count(), which calls entityPersister->getCountSql passing in the LazyCriteraCollection's $criteria property as the parameter. As mentioned, it counts all records that meet the criteria regardless of maxResults.

This is all that's needed to trigger that call stack:

$results = $repository->matching($criteria)
if ($results->isEmpty()) {
    return null
}

The weird part is that the AbstractLazyCollection's count method has the behavior i want - it just returns the count of the loaded elements in the ArrayCollection after initialization.

But now that I've gone through it more, it looks like if I can get LazyCriteriaCollection to trigger its initialize function, it'll return the count of the loaded elements - not the count query (line 52):

3827dd769e/lib/Doctrine/ORM/LazyCriteriaCollection.php (L50-L62)

I might just be able to call toArray, count that, and call it a day.

Edit: Yeah, calling toArray does not trigger a COUNT(*), so there's my workaround. I guess this issue can be closed.

@Aweptimum commented on GitHub (Jun 7, 2023): Oh, yes, I probably should have elaborated on that more. the repository's `matching` method returns an instance of AbstractLazyCollection (LazyCriteriaCollection in this case). I was checking if the collection was empty, using `isEmpty`, and thought that it might have something to do with that. Looking through most of the methods in AbstractLazyCollection, they all check if the collection is in an initialized state first. If not, they call `initialize`, which calls the abstract `doInitialize`. In LazyCriteriaCollection, `doInitialize` asks its entityPersister to load the collection. However, the overridden methods in LazyCriteriaCollection don't call `initialize`, they instead check if the collection is in an initialized state and will grab a cached value if so. If not, they delegate to the entity persister which retrieves the value through a query. `isEmpty` just delegates to `count`, and the problem is that it's one such overridden method and ends up calling `getCountSql` if it isn't in an initialized state. `count` calls `entityPersister->count()`, which calls `entityPersister->getCountSql` passing in the LazyCriteraCollection's $criteria property as the parameter. As mentioned, it counts all records that meet the criteria regardless of maxResults. This is all that's needed to trigger that call stack: ```php $results = $repository->matching($criteria) if ($results->isEmpty()) { return null } ``` The weird part is that the AbstractLazyCollection's count method has the behavior i want - it just returns the count of the loaded elements in the ArrayCollection after initialization. But now that I've gone through it more, it looks like if I can get LazyCriteriaCollection to trigger its initialize function, it'll return the count of the loaded elements - not the count query (line 52): https://github.com/doctrine/orm/blob/3827dd769edf25d59fa29a7d63efd6ba15b3a789/lib/Doctrine/ORM/LazyCriteriaCollection.php#L50-L62 I might just be able to call `toArray`, count that, and call it a day. Edit: Yeah, calling `toArray` does not trigger a COUNT(*), so there's my workaround. I guess this issue can be closed.
Author
Owner

@Aweptimum commented on GitHub (Jun 7, 2023):

I do still find it weird that the same method can return different values depending on the load state of the collection

$criteria = new Criteria(null, null, null, 5000);
$collection = $repository->matching($criteria);
echo ($collection->count()) // 2000000;
$a = $collection->toArray();
echo ($collection->count()) // 5000;
@Aweptimum commented on GitHub (Jun 7, 2023): I do still find it weird that the same method can return different values depending on the load state of the collection ```php $criteria = new Criteria(null, null, null, 5000); $collection = $repository->matching($criteria); echo ($collection->count()) // 2000000; $a = $collection->toArray(); echo ($collection->count()) // 5000; ```
Author
Owner

@greg0ire commented on GitHub (Jun 7, 2023):

That's already reported here: https://github.com/doctrine/orm/issues/9951, so let's close.

@greg0ire commented on GitHub (Jun 7, 2023): That's already reported here: https://github.com/doctrine/orm/issues/9951, so let's close.
Author
Owner

@Aweptimum commented on GitHub (Jun 7, 2023):

Fair enough, sorry if this was a massive waste of time.

@Aweptimum commented on GitHub (Jun 7, 2023): Fair enough, sorry if this was a massive waste of time.
Author
Owner

@greg0ire commented on GitHub (Jun 7, 2023):

Again, no worries, just… make sure to get all the sleep you need 😉

@greg0ire commented on GitHub (Jun 7, 2023): Again, no worries, just… make sure to get all the sleep you need :wink:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#7163