mirror of
https://github.com/doctrine/orm.git
synced 2026-03-23 22:42:18 +01:00
Interest in bulk / batch insert operations when not using post-insert identifiers? #7517
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Ocramius on GitHub (Jun 11, 2025).
Context
I'm currently working on a codebase where I identified a repeated
INSERToperation being performed.As you may know, I've been a long time advocate for using UUIDs (and specifically, UUID5 / UUID6) for synthetic entity identifiers.
Due to recent advancements in profiling technology (Sentry, Tideways, etc.), I've managed to find a typical N+1 query:
What can be done?
When
AUTO_INCREMENT,SERIALor post-insert identifiers in general aren't necessary, we can perform bulk / batchINSERToperations: this reduces the number of roundtrips to the DB, which is mostly network time.Current behavior
I've traced the operation as follows:
UnitOfWorkcomputes insert order:05c8c5f114/src/UnitOfWork.php (L1038-L1040)computeInsertExecutionOrderproduces a sorted list ofINSERToperations:05c8c5f114/src/UnitOfWork.php (L1191-L1246)EntityPersister#addInsert()is called in a loop:05c8c5f114/src/UnitOfWork.php (L1038-L1048)EntityPersister#executeInserts()is called:05c8c5f114/src/UnitOfWork.php (L1048-L1052)EntityPersister#executeInserts()creates a prepared statement, and fires it at the DB:05c8c5f114/src/Persisters/Entity/BasicEntityPersister.php (L237-L285)The plan
The idea is to either change
UnitOfWork#executeInserts()orUnitOfWork#computeInsertExecutionOrder()to produce a multidimensional array of micro-batches, where batches are only possible:I would then extract the
EntityPersister#executeInserts()call, so that it does not get performed in a loop (where applicable), and therefore internally reuses the same prepared statement.I can then also optimize the
EntityPersister#executeInserts()logic to generate SQL for a full batch (single statement).The plan does NOT involve any DBAL API addition/change.
The question
Is there interest in this optimization?
Specifically, can I work on it, or is the plan flawed? This is mostly probing whether there is interest in this.
Past issues
@beberlei commented on GitHub (Jun 11, 2025):
@Ocramius just re-read and thought you ment "for post id generators", but you meant exclusively not for them.
So yes, I am very much in favor of this. See https://www.baeldung.com/jpa-hibernate-batch-insert-update for how Hibernate does this / allows to configure this. A global batch size variable that defaults to 1. We could start there.
@Ocramius commented on GitHub (Jun 11, 2025):
I'd be fine with just reusing the same prepared statement, then making a second patch with the batch size 👍
@Ocramius commented on GitHub (Jun 11, 2025):
@beberlei a first implementation (without touching the persisters) is at #11978
@christian-kolb commented on GitHub (Jun 12, 2025):
This would be amazing for replay of projections where thousands of events are processed in one batch and therefore thousands of entities might be inserted in one batch. Network time is most of the execution time in one of our projects partially due to this reason!
@Ocramius commented on GitHub (Jun 13, 2025):
#11979 implements the configurable batch size for the
BasicEntityPersister, and only where the identifier is known upfront