mirror of
https://github.com/doctrine/orm.git
synced 2026-03-24 06:52:09 +01:00
Doctrine NOT throwing an exception when attempting to re-persist a detached entity; instead creating duplicate record. #6394
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @BusterNeece on GitHub (Jan 31, 2020).
Bug Report
Summary
After implementing the
doctrine-batch-utilsiterator to process records in one section of my application, I'm discovering that theem->clear()called during that process detaches entities across the system. If those entities are then re-persisted, they are treated as new entities and given new IDs in the database. There is no exception thrown indicating that these are detached entities, or indicating that this problem might happen. The end result is the unintentional cloning of one record x number of times, where x is the number of times this portion of code has run.The current documentation explicitly states that if you are attempting to persist an entity that is detached, that an exception will be thrown preventing this operation.
My experiences have shown that this exception is not thrown, and that instead a new record is created. The disagreement between the actual behavior and the documentation has caused me an immense deal of unnecessary pain today. The decision on Doctrine's part to deprecate entity-specific
clear()calls in favor of a system-wide clear only makes this situation that much more likely to happen in the future.Current behavior
When attempting to persist an entity that was made detached by an earlier call to
em->clear(), rather than throwing an exception upon the nextflush()call, it allows it and treats it as if it's a new record being added to the database, giving it a new auto-assigned ID but otherwise being a complete duplicate.How to reproduce
Here's an example snippet of a very easy way that I have elicited this problem in my application. It's this exact kind of logic loop that led to the problem in question:
The issue is completely resolved by reloading the
$stationvariable at the start of eachforeach(i.e.$station = $em->find(Station::class, $station->getId())), but the application currently offers no warning whatsoever that, because the loop variable has become detached, it's about to be recreated as a new entity instead.Expected behavior
As suggested in the documentation, this should instead trigger an exception which can easily be handled by the developer.
@Ocramius commented on GitHub (Jan 31, 2020):
Unlikely adapting ORM to docs here: this behaviour has been expected/stable for a long time, and detached entities are not recognizable anyway.
Docs will need adjustments.
@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius So if you're saying that this behavior (simply creating a duplicate record, despite the existing one already having an ID assigned to it, with no exception or notice or anything else) is expected...
...is there any way to not make it the expected behavior?
Honestly, I would greatly prefer the documented behavior (an exception being thrown). If you have a situation where a routine task is re-persisting detached entities, you can quickly go from 2 to 4 to 8 to hundreds of those duplicated entities, resulting in a great deal of damage to have to clean up.
It should not be this easy to write something that's as straightforward code-wise as my example loop, and yet have it be subject to an error condition that could cause such huge (and hard to figure out) problems.
As I mentioned, because clearing the EM for specific entities is being deprecated in 3.0 in favor of clearing the entire memory only increases the likelihood that this kind of error will be run into. In fact, it's exactly because of this deprecation that I switched to the batch-utils function in the first place, which also resulted in me switching from a much more selective EM clear to clearing all EM-managed records.
The combination of the two (forcing the EM
clear()to be broader than desired, and then no option whatsoever to throw an exception when accidentally re-persisting detached entities) seems like a recipe for disaster.@Ocramius commented on GitHub (Jan 31, 2020):
Nope.
Even if there was, docs follow behaviour, not the way around: changing observed behaviour would be a major BC break
@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius Okay...how about in later versions? i.e. 2.8.x or 3.0.x?
I appreciate brevity in many cases, but I feel like "Nope" doesn't do much to address the notion that what I'm describing is an extremely easy trap for a developer (even one who's used Doctrine for years, as I have) to fall into, that it is reasonable to hope that an ORM would at least warn about this (if not stopping it altogether), and that the likelihood of this is increased the more users are encouraged to clear the entire EntityManager rather than specific entity types, as will soon be the case.
@Ocramius commented on GitHub (Jan 31, 2020):
It is not possible to track detached entities: they are detached, unknown to the ORM.
In
UnitOfWork, we do indeed state this:e77091399a/lib/Doctrine/ORM/UnitOfWork.php (L1632-L1635)@Ocramius commented on GitHub (Jan 31, 2020):
Re-opening this: it is indeed requiring documentation adjustments
@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius The message there says the DBAL should be throwing an exception instead...so why isn't it in my example? The record it's trying to persist already has an assigned ID so presumably it's trying to execute an insert with that ID already explicitly set, which should be failing somewhere, even if not in the ORM.
@Ocramius commented on GitHub (Jan 31, 2020):
Not sure tbh - docs may as well be unclear there. I suggest digging in the test suite, to see if this behavior has been already checked there. We can probably drop that
casecondition overall, after that.@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius Just ran it again while having it echo out the SQL queries it's actually executing...for the INSERT query it's stripping out the ID entirely as a set value (it's basically just executing a standard
INSERT INTO table (other, fields, here)but with ID missing entirely), so the fact that it's already set on the entity is not only not caught by the ORM, but there's no way the DBAL or the DB itself could catch it either, as it doesn't even see it's in that state by that point.In other words, nothing in this stack could've protected me from this situation today.
With all due respect, I'm not sure how this is acceptable.
@Ocramius commented on GitHub (Jan 31, 2020):
It may indeed not be OK from your point of view, but this behavior is currently being relied upon:
Changing that is indeed a BC break.
@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius I understand that it is considered a BC break, but clearly many other changes, including changes that relate to this, are intended for the EntityManager in 2.8.x and 3.0. Semver suggests that these releases (especially 3.x) can include BC breaks.
Considering that if the EntityManager itself doesn't detect this condition, its removal of auto-assigned IDs will ensure that neither the DBAL nor DB itself can alert for it...does that not suggest that the ORM is the only place where a change could be made at this point to help protect against this happening where it's really not expected?
Again, even an option to warn on this condition instead of just letting it happen silently would be greatly preferred.
@Ocramius commented on GitHub (Jan 31, 2020):
As mentioned above:
An un-managed entity that is persisted is assumed to be in
STATE_NEW, unless we keep a map of detached objects (not feasible)@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius How is it not possible to detect that the primary key identifier for the entity is already set and has a non-null value on the object being persisted?
It appears this was discussed something like 9 years ago, and this exact same suggestion came up and was shot down for reasons that are unclear. Yes, it would require reflection, but the entire persistence mechanism already reflects on the other properties of the object. In fact, to know to strip that ID from the query it already has to know what those identifiers are to exclude them from the properties being set, so what is the difficulty in checking that value by reflection to see if it's set?
@DGCarramona commented on GitHub (Jan 31, 2020):
Hello ! Would it be possible to generate a warning if persisting entity, which id should be automatically generated but is yet assigned ? It would resolve the situation and being a warning it's not a BC break.
@Ocramius commented on GitHub (Jan 31, 2020):
An entity with an assigned PK is the default case for all entities I work with (since I also don't use generated identifiers) 🤷♀️
@DGCarramona a warning is a BC break, as discussed in https://github.com/doctrine/coding-standard/pull/150
@BusterNeece commented on GitHub (Jan 31, 2020):
@Ocramius Okay, but let's say that you do use generated identifiers. This is specified in the metadata for that primary key, that it has a GeneratedValue strategy.
If you're persisting an entity that has a GeneratedValue identifier, and that property already has a value, then you can safely assume you're playing with a detached entity that previously had that ID generated by the entity manager. At least, safely enough to warn on that exact condition.
What's more, it's specifically because these entities are using GeneratedValue identifiers that their ID was stripped out before the DBAL/DB layer (as the ORM is assuming it's going to get a value back for last inserted ID, not send one) that this is an otherwise unprotected situation.
@Ocramius commented on GitHub (Jan 31, 2020):
That's an interesting use-case, indeed I hacked together a test that verifies that generated identifiers are ignored during persistence:
Guess not using them (and will likely continue avoiding them :P ) has shadowed this obscure behavior
@beberlei commented on GitHub (Feb 13, 2020):
@SlvrEagle23 what database and Id generator are you using? For MySQL with auto increment I believe we get an exception here from flush as documented. You must be using Postgresql with a Sequence/Serial?
@BusterNeece commented on GitHub (Feb 14, 2020):
@beberlei MariaDB 10.2 with Auto-increment, but as mentioned in the above descriptions, the fact that the entity identifier is set as an auto-generated identifier seems to be making the ORM strip it from the subsequent
INSERTthat happens when flushing the detached entity, meaning there's no way for the DB to know it's a duplicate or to throw an error.@beberlei commented on GitHub (Feb 14, 2020):
@SlvrEagle23 you mention a discussion 9 years ago, can you link it or did i just not see it here?
@BusterNeece commented on GitHub (Jun 26, 2020):
@beberlei @Ocramius Hello all, following up on this issue since I notice it's still open and doesn't appear to have any resolution to date.
I'm preparing my applications to be able to migrate more easily to Doctrine 3.0, and part of that is removing
em->flush(record)andem->clear(record)calls and replacing them with the corresponding calls but for the entire managed entity graph all at once.As a result of these changes, there are now far more instances in my code where one function clears the global EntityManager and other functionality isn't aware of this happening, and doesn't know that it needs to "re-fetch" every entity that it's currently playing with. Because there is no warning or other notice when attempting to persist an entity that already has a generated ID, as in the examples above, this means that any time
em->clear()is called, a huge swath of code must be inspected and modified to react to this to avoid accidentally creating clones of records the next time things are persisted, and if I miss a single instance then I wind up with unexpected results and there's nothing I can do about it.I understand the reasoning behind removing entity-specific flushes and clears, but it appears that there was no attention paid to the impact this would have on applications when clearing the entire EntityManager becomes the only way of achieving that particular optimization, and there is no documentation anywhere that suggests a way to make this less painful or more intuitive for developers.
@BusterNeece commented on GitHub (Jun 26, 2020):
Basically, the tl;dr here is that this comment in this line of code appears to be incorrect in the case of AUTO GeneratedValues:
2a98a98cd3/lib/Doctrine/ORM/UnitOfWork.php (L1630-L1634)The line where it assumes the state is
STATE_NEWwould actually work as previously documented, and would correctly prevent persisting existing entities silently resulting in an insert, if only the$assumeparameter wasn't set.As far as I can gather as an outside developer, though, the
UnitOfWorkclass is absolutely not meant to be overridden; it is explicitly specified by name in the constructor of the EntityManager which itself can only be decorated and not extended.Basically, I could fix this on my end with a single change in a single line of code (specified above) and yet as a consequence of the library's design, I can't, so really I have no option but to continue to follow up here and ask for help.
@ikulis commented on GitHub (Dec 3, 2020):
@SlvrEagle23 as workaround EntityManager can be decorated with this class:
Adding to symfony configuration:
@mkoskl commented on GitHub (Jun 6, 2025):
I know this is old issue, but it's still open, so I thought I'll give use-case for current behavior.
We're developing software that makes use of "use-as-template", which will copy the existing entity and persist it as new.
I fetch it from database, detach it and call persist and flush. It's quite easy way to duplicate the row.