mirror of
https://github.com/doctrine/orm.git
synced 2026-04-29 09:23:20 +02:00
The hydration cost can easily & significantly be decreased in case of repeated row data + costly type conversion #6589
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @NoiseByNorthwest on GitHub (Dec 13, 2020).
The well known performance issue with fetch joins on collection-like relations (*-to-many) is that the result set will contain repeated data. The simple case being an entity with one of its collection relations fetched which will then cause the (root) entity data to be repeated as many times as there are items in the collection.
It get obviously exponentially worse when nested or sibling collections are also fetched. This issue could BTW be avoided with a feature like this one (not merged yet) https://github.com/doctrine/orm/pull/1569 .
That being said, I've discovered another performance issue last week, which does not need thousands of rows to be noticeable. It occurs when the repeated data are mapped to a type with a costly conversion. In my case this is a
json_documentcolumn, this type is provided by https://github.com/dunglas/doctrine-json-odm .More concretly my query targets the table A, and fetch a collection of table B records. A contains a
json_documentcolumn and for each A record there are ~66 B records.The result set then contains 526 rows representing 526 distinct B records and only 8 distinct A records.
The query was to slow to meet my requirements and after having profiled that (in tracing mode to get actual call counts & sampling mode to get accurate timings) I've discovered that 70% of the hydration time is taken by
JsonDocumentType::convertToPHPValue()which is called 526 times (instead of 8 as logically required).And this slowness has obviously more to do with the fact that
JsonDocumentType::convertToPHPValue()is called 526 times where it could be called only 8 times than with the own slowness ofJsonDocumentType::convertToPHPValue()which is a bit expected and with probably not so much room for optimizations.The problem lies in the fact that
AbstractHydrator::gatherRowData()which is, among other things, responsible to call the proper type conversion function for each column of the given row, does not do any caching logic to avoid doing this process multiple times for the same logical set of column values (representing the root or a fetched relation AKA DQL alias).So I've written my own hydrator, which extends
ObjectHydratorand after having copy/pasted thegatherRowData()function I've added this caching logic.When used, I've observed a 78% decrease of the
hydrateAllData()cost. Here is a table below with more details:hydrateAllData()wall time (ms)gatherRowData()call countgatherRowData()wall time (ms)JsonDocumentType::convertToPHPValue()call countJsonDocumentType::convertToPHPValue()wall time (ms)My questions:
gatherRowData()implementation (considering its simplicity and the fact that this use case may not be so uncommon) ?The custom hydrator (original gatherRowData() implementation):
The Doctrine bundle configuration:
When querying:
@holtkamp commented on GitHub (Dec 14, 2020):
Not totally sure it is related, but concerning the costly conversion it be nice to have a look at https://github.com/creof/doctrine2-spatial/issues/121 and https://github.com/doctrine/orm/pull/1241
@NoiseByNorthwest commented on GitHub (Dec 16, 2020):
Yes in both cases there is a costly conversion involved, but in my case the conversion cost on its own is not noticeable. It is actually noticeable because it is unnecessarily repeated.