WeakReference to free up memory automatically #6391

Open
opened 2026-01-22 15:32:16 +01:00 by admin · 8 comments
Owner

Originally created by @bOmBeLq on GitHub (Jan 21, 2020).

Feature Request

Q A
New Feature yes
RFC yes
BC Break yes (or no if configurable)

Summary

So php 7.4 introduces WeakReference. AFAIK UnitOfWork keeps references to all fetched objects and that's why if you want to free up memory you have to explicitly detach them or clear the entity manager. I understand that this is to increase performance and not query for objects if they ware already fetched which totally makes sense.

I'm wondering if it would make any sense to use WeakReference to keep reference to fetched objects. As long as object is used anywhere outside of ORM it will be kept in memory and re-used from the ref. But if app context does not use entities anymore they would be automatically removed from memory.
Do you see any potential problems with such approach?

Auto-detaching will probably lead to performance drop in some specific cases - should this be considered BC break in that case? This feature could be configurable (on/off) as well.

Originally created by @bOmBeLq on GitHub (Jan 21, 2020). ### Feature Request | Q | A |------------ | ------ | New Feature | yes | RFC | yes | BC Break | yes (or no if configurable) #### Summary So php 7.4 introduces WeakReference. AFAIK UnitOfWork keeps references to all fetched objects and that's why if you want to free up memory you have to explicitly detach them or clear the entity manager. I understand that this is to increase performance and not query for objects if they ware already fetched which totally makes sense. I'm wondering if it would make any sense to use WeakReference to keep reference to fetched objects. As long as object is used anywhere outside of ORM it will be kept in memory and re-used from the ref. But if app context does not use entities anymore they would be automatically removed from memory. Do you see any potential problems with such approach? Auto-detaching will probably lead to performance drop in some specific cases - should this be considered BC break in that case? This feature could be configurable (on/off) as well.
admin added the Question label 2026-01-22 15:32:16 +01:00
Author
Owner

@jvasseur commented on GitHub (Jan 27, 2020):

I thought of that to, but freeing references a long as an object isn't used anymore would work because the unit of work needs to keep a reference of all objects to generate the changeset on flush.

For example

$entity = $em->find(...);

$entity->setFoo($newFoo);

unset($entity);

$em->flush();

If the unit of work "forgot" about $entity when we call unset on it, the flush won't save any changes.

@jvasseur commented on GitHub (Jan 27, 2020): I thought of that to, but freeing references a long as an object isn't used anymore would work because the unit of work needs to keep a reference of all objects to generate the changeset on flush. For example ```php $entity = $em->find(...); $entity->setFoo($newFoo); unset($entity); $em->flush(); ``` If the unit of work "forgot" about `$entity` when we call `unset` on it, the flush won't save any changes.
Author
Owner

@mvorisek commented on GitHub (Feb 2, 2020):

@jvasseur This can be handled by __destruct of some wrapper class. If object is dereferenced (by refcount = 0), wrapper destoy method will be called and it will check if flush is needed (i.e. needs to stay in the memory) - if yes, the wrapper object will be recreated and the reference to the wrapper instance will be saved (otherwise it will be destroyed immediatelly), if no, it will be simply freed.

Related with: https://github.com/php/php-src/pull/4882#issuecomment-561787685 probably a RFC for register_object_destroy_callback will be much better choise which will allow any managers like EntityManager to register a destroy callback on any object, allow to attach (i.e. add reference to) the object and cancel so processing of the destruct if needed.

@mvorisek commented on GitHub (Feb 2, 2020): @jvasseur This can be handled by `__destruct` of some wrapper class. If object is dereferenced (by refcount = 0), wrapper destoy method will be called and it will check if flush is needed (i.e. needs to stay in the memory) - if yes, the wrapper object will be recreated and the reference to the wrapper instance will be saved (otherwise it will be destroyed immediatelly), if no, it will be simply freed. Related with: https://github.com/php/php-src/pull/4882#issuecomment-561787685 probably a RFC for `register_object_destroy_callback` will be much better choise which will allow any managers like `EntityManager` to register a destroy callback on any object, allow to attach (i.e. add reference to) the object and cancel so processing of the destruct if needed.
Author
Owner

@jvasseur commented on GitHub (Feb 2, 2020):

@mvorisek that's a possible solution but it has some performances implication since it would means changes will be tracked in modified objects even if we never flush them.

@jvasseur commented on GitHub (Feb 2, 2020): @mvorisek that's a possible solution but it has some performances implication since it would means changes will be tracked in modified objects even if we never flush them.
Author
Owner

@bOmBeLq commented on GitHub (Feb 3, 2020):

If the unit of work "forgot" about $entity when we call unset on it, the flush won't save any changes.

I would say that this can be expected behavior. A tradeoff for auto-detach and a BC break for sure in that case.
If flush is called on objects which are not used in app context anymore then I suppouse this usually is some app design mistake.

$this->someService->updateSomeThingsButDontFlush();
$this->em->flush(); // what am I flushing again?

In above example service itself should flush the changes.
Eventually this may be turned off explictly if needed

$this->em->disableGarbageCollector();
$this->someService->updateSomeThingsButDontFlush();
$this->em->flush(); // just flushing complicated changes made in someService

That being said I'm still not sure if that is good idea. Just considering.

@bOmBeLq commented on GitHub (Feb 3, 2020): > If the unit of work "forgot" about `$entity` when we call `unset` on it, the flush won't save any changes. I would say that this can be expected behavior. A tradeoff for auto-detach and a BC break for sure in that case. If flush is called on objects which are not used in app context anymore then I suppouse this usually is some app design mistake. ``` $this->someService->updateSomeThingsButDontFlush(); $this->em->flush(); // what am I flushing again? ``` In above example service itself should flush the changes. Eventually this may be turned off explictly if needed ``` $this->em->disableGarbageCollector(); $this->someService->updateSomeThingsButDontFlush(); $this->em->flush(); // just flushing complicated changes made in someService ``` That being said I'm still not sure if that is good idea. Just considering.
Author
Owner

@mvorisek commented on GitHub (Feb 3, 2020):

@bOmBeLq I do not think that auto GC any changed entities which should / can be flushed is a good idea. Also GC not changed entities should be turned off by default as if these entities are loaded again, they can contain different values (if they are not loaded in same transaction and repeatable read or better transaction isolation).

@mvorisek commented on GitHub (Feb 3, 2020): @bOmBeLq I do not think that auto GC any changed entities which should / can be flushed is a good idea. Also GC not changed entities should be turned off by default as if these entities are loaded again, they can contain different values (if they are not loaded in same transaction and repeatable read or better transaction isolation).
Author
Owner

@jvasseur commented on GitHub (Feb 3, 2020):

$this->someService->updateSomeThingsButDontFlush();
$this->em->flush(); // what am I flushing again?

I'm using this pattern in a lot of cases, basically services are responsible for business logic but never commit (flush) the changes that I consider the responsibility of the controller that can then batch multiple change in the same flush to ensure they are done at the same time (or can decide to not flush them if needed).

That being said I'm still not sure if that is good idea. Just considering.

Same here, I'm not sure if either doing it or not is a good idea, I'm just listing problems I see in going in this direction so that everyone can make the best decision possible.

@jvasseur commented on GitHub (Feb 3, 2020): > ``` > $this->someService->updateSomeThingsButDontFlush(); > $this->em->flush(); // what am I flushing again? > ``` I'm using this pattern in a lot of cases, basically services are responsible for business logic but never commit (flush) the changes that I consider the responsibility of the controller that can then batch multiple change in the same flush to ensure they are done at the same time (or can decide to not flush them if needed). > That being said I'm still not sure if that is good idea. Just considering. Same here, I'm not sure if either doing it or not is a good idea, I'm just listing problems I see in going in this direction so that everyone can make the best decision possible.
Author
Owner

@beberlei commented on GitHub (Feb 12, 2020):

We have a started discussion internally about a similar approach using WeakMap and WeakReference for different parts of the UnitOfWork and also documented the downsides with the lost update in case a changed update goes out of scope (for example during a batch update loop).

Nothing has been decided yet, but its something we have on our radar.

@beberlei commented on GitHub (Feb 12, 2020): We have a started discussion internally about a similar approach using WeakMap and WeakReference for different parts of the UnitOfWork and also documented the downsides with the lost update in case a changed update goes out of scope (for example during a batch update loop). Nothing has been decided yet, but its something we have on our radar.
Author
Owner

@mpdude commented on GitHub (Feb 29, 2024):

Currently debugging a batch processing script with excessive memory usage, so I came across this.

Challenges: We cannot use WeakMap for the identity map directly. It may happen that an entity is updated, but no other references kept in userland (outside the ORM). In that case we must not simply drop the reference, but keep it until the flush() operation unless instructed to do otherwise. From the identity map/UoW point of view, we also cannot easily see whether an entity has been changed, and we also do not notice when it changes (to put some kind of lock in place or similar).

Next challenge: What if we have one-to-many associations? If the identity map used a WeakMap, but both the collection on the "one" side as well as all the owning side backreferences use normal references, would this suffice of make sure the entities are destructed? Maybe the GC could handle this case (weakmap reference to a cyclic "normal" reference situation), but I'd need to check.

Update: If we have cyclic references between to objects, and this group is referred to only through a WeakReference, then gc_collect_cycles() can collect it.

@mpdude commented on GitHub (Feb 29, 2024): Currently debugging a batch processing script with excessive memory usage, so I came across this. Challenges: We cannot use WeakMap for the identity map directly. It may happen that an entity is updated, but no other references kept in userland (outside the ORM). In that case we must not simply drop the reference, but keep it until the `flush()` operation unless instructed to do otherwise. From the identity map/UoW point of view, we also cannot easily see whether an entity has been changed, and we also do not notice _when_ it changes (to put some kind of lock in place or similar). Next challenge: What if we have one-to-many associations? If the identity map used a WeakMap, but both the collection on the "one" side as well as all the owning side backreferences use normal references, would this suffice of make sure the entities are destructed? Maybe the GC could handle this case (weakmap reference to a cyclic "normal" reference situation), but I'd need to check. Update: If we have cyclic references between to objects, and this group is referred to only through a `WeakReference`, then `gc_collect_cycles()` can collect it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#6391