Fetching untracked entities #6607

Open
opened 2026-01-22 15:35:39 +01:00 by admin · 7 comments
Owner

Originally created by @dbu on GitHub (Jan 18, 2021).

In #7901, detaching entities has been deprecated. There are use cases where having untracked "entities" would be very useful though. In our case, we have a system that reads various entities and then creates or updates an entity (of a different type). We know that the entities we read will not (or even must not) be changed. And we would be glad to be able to leave those objects up for garbage collection. @stof mentions a similar issue in the discussion on the deprecation merge request.

I can see that the general detach is very fragile and can lead to weird issues. But we know we don't want to entity to be tracked when we fetch it from a repository/entity manager. Could we make a query hint that does dehydrate to the object but not add it to the identity map? Plus a way to find() without tracking (probably a separate method - find does not take query hints)?

@thePanz and me would have some time to work on this if we can agree on the need for this and an architecture for it.

Originally created by @dbu on GitHub (Jan 18, 2021). In #7901, detaching entities has been deprecated. There are use cases where having untracked "entities" would be very useful though. In our case, we have a system that reads various entities and then creates or updates an entity (of a different type). We know that the entities we read will not (or even must not) be changed. And we would be glad to be able to leave those objects up for garbage collection. @stof mentions a similar issue in the discussion on the deprecation merge request. I can see that the general `detach` is very fragile and can lead to weird issues. But we know we don't want to entity to be tracked when we fetch it from a repository/entity manager. Could we make a query hint that does dehydrate to the object but not add it to the identity map? Plus a way to `find()` without tracking (probably a separate method - find does not take query hints)? @thePanz and me would have some time to work on this if we can agree on the need for this and an architecture for it.
Author
Owner

@dbrumann commented on GitHub (Jan 22, 2021):

Just out of curiosity, do you think modifying the change tracking policy for the entities would help in your use case or would this still be problematic because these entities will likely not be picked up by garbage collection?

@dbrumann commented on GitHub (Jan 22, 2021): Just out of curiosity, do you think modifying the change tracking policy for the entities would help in your use case or would this still be problematic because these entities will likely not be picked up by garbage collection?
Author
Owner

@dbu commented on GitHub (Jan 25, 2021):

our main concern is indeed garbage collection, as we process a lot of data to create a new entry. i am not too familiar with the change tracking policy, but from the name of it assume i could tell to not track changes on some entity. that seems not to match our use case, and i could probably clone the entity to get the same effect? (except maybe for relations that are loaded as proxy only at the time of cloning)

@dbu commented on GitHub (Jan 25, 2021): our main concern is indeed garbage collection, as we process a lot of data to create a new entry. i am not too familiar with the change tracking policy, but from the name of it assume i could tell to not track changes on some entity. that seems not to match our use case, and i could probably clone the entity to get the same effect? (except maybe for relations that are loaded as proxy only at the time of cloning)
Author
Owner

@stof commented on GitHub (Jan 28, 2021):

my own use case for that is a case where I deal with my entities in a read-only way for a processing, but dealing with a big number of such entities (using Query::iterate() of course, not getResults).
The change tracking is not the issue in my case. The issue is the fact that the identity map retains the object.

Today, I deal with that with detach(), which is deprecated. I understand that detaching arbitrary objects from the identity map will cause issues for the change tracking of related objects (hence the deprecation). But if we can hydrate objects without putting them in the identity map (and so also untracked for changes), we won't affect the change tracking of other objects, as these objects won't ever enter the identity map before being removed from it.

I think that to be safe, we might need to reuse existing entries of the identity map if the fetching involves some objects which are already fetched (especially regarding toOne relations). but newly fetched objects should be hydrated without entering the UnitOfWork after that.

@stof commented on GitHub (Jan 28, 2021): my own use case for that is a case where I deal with my entities in a read-only way for a processing, but dealing with a big number of such entities (using `Query::iterate()` of course, not `getResults`). The change tracking is not the issue in my case. The issue is the fact that the identity map retains the object. Today, I deal with that with `detach()`, which is deprecated. I understand that detaching arbitrary objects from the identity map will cause issues for the change tracking of related objects (hence the deprecation). But if we can hydrate objects without putting them in the identity map (and so also untracked for changes), we won't affect the change tracking of other objects, as these objects won't ever enter the identity map before being removed from it. I think that to be safe, we might need to reuse _existing_ entries of the identity map if the fetching involves some objects which are already fetched (especially regarding `toOne` relations). but newly fetched objects should be hydrated without entering the UnitOfWork after that.
Author
Owner

@beberlei commented on GitHub (Feb 7, 2021):

Detach might be undeprecated

@beberlei commented on GitHub (Feb 7, 2021): Detach might be undeprecated
Author
Owner

@stof commented on GitHub (Feb 10, 2021):

Undeprecating it would be fine with me. Maybe documenting it as an advanced topic.

@stof commented on GitHub (Feb 10, 2021): Undeprecating it would be fine with me. Maybe documenting it as an advanced topic.
Author
Owner

@stof commented on GitHub (Jul 18, 2025):

I wished for this feature again when working with iterable processing, as it is painful currently to know exactly what to detach to avoid leaking memory due to the identity map during hydration:

  • we have to take care of any relation to detach them properly
  • we need to be careful of not detaching object instances that we are also using in a managed way for a different reason (cases where one of the object you use in a managed way is also referenced in a relation in one of the items of the iterable).
  • we cannot use $em->clear() at regular points during iteration, as that would also clear the managed objects we use that are not coming from the iterable.

Regarding the points listed in https://github.com/doctrine/orm/pull/7936#issuecomment-776283082, here is my proposal:

  • queries run with a "unmanaged" hint (name to be bikeshed) perform hydration of results without adding new entries in the identity map. However, if one of the object is already tracked in the identity map, we reuse the tracked object for the hydrated reference
  • for collections, they are hydrated using a lazy collection that would load data from the DB (similar to PersistentCollection) but without any change tracking of the PersistentCollection (that would be an unmanaged lazy collection)
  • for proxies, they are hydrated as unmanaged objects (so that we don't need to detach proxies objects for ToOne relations in the iterated unmanaged entities). When initializing an unmanaged proxy, it would be hydrated as unmanaged as well. Of course, we would still use an existing managed proxy if it is already in the identity map (same rule than for query results).

This would allow working on big result sets in an iterable way without the need to detach things during iteration (for cases where you don't need change tracking).

Do you think this would be doable @beberlei ?

@stof commented on GitHub (Jul 18, 2025): I wished for this feature again when working with iterable processing, as it is painful currently to know exactly what to detach to avoid leaking memory due to the identity map during hydration: - we have to take care of any relation to detach them properly - we need to be careful of _not_ detaching object instances that we are also using in a managed way for a different reason (cases where one of the object you use in a managed way is also referenced in a relation in one of the items of the iterable). - we cannot use `$em->clear()` at regular points during iteration, as that would also clear the managed objects we use that are not coming from the iterable. Regarding the points listed in https://github.com/doctrine/orm/pull/7936#issuecomment-776283082, here is my proposal: - queries run with a "unmanaged" hint (name to be bikeshed) perform hydration of results without adding new entries in the identity map. However, if one of the object is already tracked in the identity map, we reuse the tracked object for the hydrated reference - for collections, they are hydrated using a lazy collection that would load data from the DB (similar to PersistentCollection) but without any change tracking of the PersistentCollection (that would be an unmanaged lazy collection) - for proxies, they are hydrated as unmanaged objects (so that we don't need to detach proxies objects for ToOne relations in the iterated unmanaged entities). When initializing an unmanaged proxy, it would be hydrated as unmanaged as well. Of course, we would still use an existing managed proxy if it is already in the identity map (same rule than for query results). This would allow working on big result sets in an iterable way without the need to detach things during iteration (for cases where you don't need change tracking). Do you think this would be doable @beberlei ?
Author
Owner

@hlecorche commented on GitHub (Jul 18, 2025):

For iterations, I created a 'snapshot manager'. It's not very clean (identity map is used), but it works very well (until we have something better) :

https://github.com/e-commit/doctrine-orm-refetch?tab=readme-ov-file#snapshot

The process works as follows: we take a snapshot before the iteration. Then regularly, we call the cleaner (which detaches all entities that were not attached at the time of the snapshot).

@hlecorche commented on GitHub (Jul 18, 2025): For iterations, I created a 'snapshot manager'. It's not very clean (identity map is used), but it works very well (until we have something better) : https://github.com/e-commit/doctrine-orm-refetch?tab=readme-ov-file#snapshot The process works as follows: we take a snapshot before the iteration. Then regularly, we call the cleaner (which detaches all entities that were not attached at the time of the snapshot).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#6607