Long running process leaks memory #6803

Open
opened 2026-01-22 15:38:59 +01:00 by admin · 7 comments
Owner

Originally created by @flaushi on GitHub (Aug 7, 2021).

Bug Report

Q A
BC Break ?
Version current

Summary

I think there is a memory leak in long running processes.

Current behavior

My memory consumption grows all the time although I keep no reference to to visited nodes and clear the em regularly.
I am traversing an object graph using iteration (not recursion). My stack only has the identifiers, not the entities.

How to reproduce

please see my example https://stackoverflow.com/questions/68686479/leaking-memory-while-traversing-an-object-graph/68686896#68686896 here

Expected behavior

I'd expect to get along with no more than a few megabyte memory consumption all the time.

Originally created by @flaushi on GitHub (Aug 7, 2021). ### Bug Report | Q | A |------------ | ------ | BC Break | ? | Version | current #### Summary I think there is a memory leak in long running processes. #### Current behavior My memory consumption grows all the time although I keep no reference to to visited nodes and clear the em regularly. I am traversing an object graph using iteration (not recursion). My stack only has the identifiers, not the entities. #### How to reproduce please see my example https://stackoverflow.com/questions/68686479/leaking-memory-while-traversing-an-object-graph/68686896#68686896 here #### Expected behavior I'd expect to get along with no more than a few megabyte memory consumption all the time.
Author
Owner

@beberlei commented on GitHub (Aug 7, 2021):

You can use the mwmory profiler to find where this memory is https://github.com/arnaud-lb/php-memory-profiler

@beberlei commented on GitHub (Aug 7, 2021): You can use the mwmory profiler to find where this memory is https://github.com/arnaud-lb/php-memory-profiler
Author
Owner

@flaushi commented on GitHub (Aug 7, 2021):

Wow, I didn't know about this tool, great!
However, this is the situation:
image

the query being executed is this:

return $this->_em->createQuery(
    'SELECT s from App\Entity\DataCategory s 
      WHERE s.deletedAt IS NULL 
        AND MY_JSON_CONTAINS(s.tags, :tags) = true
   ORDER BY s.name'
    )
    ->setParameter('tags', json_encode($tags) )
    ->getResult();

this should just query the entities and add them to the UnitOfWork, which I clear every regularly. How is it possible that memory is leaked then?

Edit:
This is confusing. My code actually fetches many more entities, but like

$inputItem->dc = $this->em->find(DataCategory::class, $inputItem->dc); // not reported or visible in memprof

if ($inputItem->dc instanceof TagDataCategory)
    $children = $this->em->getRepository(DataCategory::class)
        ->getCategoriesWithTags($inputItem->dc->selectedTags); // <--- these are reported by memprof
else
    $children = $inputItem->dc->getChildren(); // these are direct ManyToOne associations

Am I guessing correctly that memprof only reports allocations that have not been freed, so that the DQL query is the one which leaks??

Thank you so much for your help!

@flaushi commented on GitHub (Aug 7, 2021): Wow, I didn't know about this tool, great! However, this is the situation: ![image](https://user-images.githubusercontent.com/15819451/128596433-d9434c28-5fc4-49ea-ba73-fa1deb02c526.png) the query being executed is this: ```php return $this->_em->createQuery( 'SELECT s from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name' ) ->setParameter('tags', json_encode($tags) ) ->getResult(); ``` this should just query the entities and add them to the UnitOfWork, **which I clear every regularly**. How is it possible that memory is leaked then? **Edit:** This is confusing. My code actually fetches many more entities, but like ```php $inputItem->dc = $this->em->find(DataCategory::class, $inputItem->dc); // not reported or visible in memprof if ($inputItem->dc instanceof TagDataCategory) $children = $this->em->getRepository(DataCategory::class) ->getCategoriesWithTags($inputItem->dc->selectedTags); // <--- these are reported by memprof else $children = $inputItem->dc->getChildren(); // these are direct ManyToOne associations ``` Am I guessing correctly that memprof only reports allocations that have not been freed, so that the DQL query is the one which leaks?? Thank you so much for your help!
Author
Owner

@greg0ire commented on GitHub (Aug 7, 2021):

From the description in the README (emphasis mine):

The extension tracks the allocation and release of memory blocks to report the amount of memory leaked by every function, method, or file in a program.

  • Reports non-freed memory at arbitrary points in the program
@greg0ire commented on GitHub (Aug 7, 2021): From the description in the README (emphasis mine): > The extension tracks the allocation and release of memory blocks to report the amount of memory _leaked_ by every function, method, or file in a program. > > - Reports non-freed memory at arbitrary points in the program
Author
Owner

@flaushi commented on GitHub (Aug 7, 2021):

So, I am speechless. This means then that the Repository method leaks???

I thought when I load an entity through the entitiy manager it is inserted in the UnitofWork which is cleared properly by $em->clear().

I changed the repository method to first load only the ids of suitable entities an then find them

class DataCategoryRepository extends EntityRepository
{
    public function getCategoriesWithTags(array $tags, $prefetchMode = false) : array
    {
            return array_map(
                fn ($id) => $this->_em->find(DataCategory::class, $id),
                $this->getCategoryIdsWithTags($tags));
        

        //$where = 'SELECT s from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name';
        //return $this->_em->createQuery($where)
         //   ->setParameter('tags', json_encode($tags) )
         //   ->getResult();
    }

   
    public function getCategoryIdsWithTags(array $tags) : array
    {
        $where = 'SELECT s.id from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name';
        return array_column(
            $this->_em->createQuery($where)
                ->setParameter('tags', json_encode($tags) )
                ->getScalarResult(),
            'id');
    }

again here a new screenshot
image

so this looks as if the repository method has a leak? Where?

Or could the rest of my code be leaking?

for the sake of completeness:

class JsonContainsCustomDQLFunction extends FunctionNode
{
    /** @var Node */
    private $second;
    /** @var Node */
    private $first;

    public function getSql(SqlWalker $sqlWalker)
    {
        $first = $this->first->dispatch($sqlWalker);
        $second = $this->second->dispatch($sqlWalker);

        if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof PostgreSqlPlatform) {
            return "$first @> $second";

        } else if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof MySqlPlatform) {
            return "JSON_CONTAINS($first, $second)";
        } else
            throw new QueryException('Platform for JSON_CONTAINS not supported.');
    }

    public function parse(Parser $parser)
    {
        $parser->match(Lexer::T_IDENTIFIER);
        $parser->match(Lexer::T_OPEN_PARENTHESIS);
        $this->first = $parser->StringPrimary();
        $parser->match(Lexer::T_COMMA);
        $this->second = $parser->StringPrimary();
        $parser->match(Lexer::T_CLOSE_PARENTHESIS);
    }
}
@flaushi commented on GitHub (Aug 7, 2021): So, I am speechless. This means then that the Repository method leaks??? I thought when I load an entity through the entitiy manager it is inserted in the UnitofWork which is cleared properly by $em->clear(). I changed the repository method to first load only the ids of suitable entities an then `find` them ``` class DataCategoryRepository extends EntityRepository { public function getCategoriesWithTags(array $tags, $prefetchMode = false) : array { return array_map( fn ($id) => $this->_em->find(DataCategory::class, $id), $this->getCategoryIdsWithTags($tags)); //$where = 'SELECT s from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name'; //return $this->_em->createQuery($where) // ->setParameter('tags', json_encode($tags) ) // ->getResult(); } public function getCategoryIdsWithTags(array $tags) : array { $where = 'SELECT s.id from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name'; return array_column( $this->_em->createQuery($where) ->setParameter('tags', json_encode($tags) ) ->getScalarResult(), 'id'); } ``` again here a new screenshot ![image](https://user-images.githubusercontent.com/15819451/128608308-66e7ed74-9e3c-4a3c-9a3a-409eb1033a4a.png) so this looks as if the repository method has a leak? Where? Or could the rest of my code be leaking? for the sake of completeness: ``` class JsonContainsCustomDQLFunction extends FunctionNode { /** @var Node */ private $second; /** @var Node */ private $first; public function getSql(SqlWalker $sqlWalker) { $first = $this->first->dispatch($sqlWalker); $second = $this->second->dispatch($sqlWalker); if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof PostgreSqlPlatform) { return "$first @> $second"; } else if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof MySqlPlatform) { return "JSON_CONTAINS($first, $second)"; } else throw new QueryException('Platform for JSON_CONTAINS not supported.'); } public function parse(Parser $parser) { $parser->match(Lexer::T_IDENTIFIER); $parser->match(Lexer::T_OPEN_PARENTHESIS); $this->first = $parser->StringPrimary(); $parser->match(Lexer::T_COMMA); $this->second = $parser->StringPrimary(); $parser->match(Lexer::T_CLOSE_PARENTHESIS); } } ```
Author
Owner

@greg0ire commented on GitHub (Aug 7, 2021):

When you call the repository, since clear isn't called inside it, more memory is used than before, presumably because of the entity map. Although this fits the definition of a leak, it is intended, but memprof doesn't know about this.

Maybe you could try using https://github.com/BitOne/php-meminfo instead?

I think that instead of showing you what method "leaked" memory, it will show you what objects are taking up so much memory. There is even a guide on hunting down memory leaks:

https://github.com/BitOne/php-meminfo/blob/master/doc/hunting_down_memory_leaks.md

Hope this helps, I haven't had to do this myself before.

@greg0ire commented on GitHub (Aug 7, 2021): When you call the repository, since `clear` isn't called inside it, more memory is used than before, presumably because of the entity map. Although this fits the definition of a leak, it is intended, but memprof doesn't know about this. Maybe you could try using https://github.com/BitOne/php-meminfo instead? I think that instead of showing you what method "leaked" memory, it will show you what objects are taking up so much memory. There is even a guide on hunting down memory leaks: https://github.com/BitOne/php-meminfo/blob/master/doc/hunting_down_memory_leaks.md Hope this helps, I haven't had to do this myself before.
Author
Owner

@flaushi commented on GitHub (Aug 7, 2021):

Thanks for this direction I will follow it tmorrow.

Anyway the fact that I am calling $em->find(DataCategory::class, $id) over and over without seeing it in the memprof, but my DQL query with getResult() being shown makes me wonder.

To conclude this support case:

A) you are not aware of any memleak in queries and getResult's, right?

B) And it should be possible to "travel" the association graph of entities over millions of jumps without leaking memory, too? (of course with intermediate $em->clear()'s)

C) Both, $em->find(fqcn, $id) and $em->createQuery()->getResult() are supposed to return entities that are stored automatically in the entityMap before being returned to me?
Is there an option to get hydrated but unmanaged entities from the entity manager? (I guess no)

@flaushi commented on GitHub (Aug 7, 2021): Thanks for this direction I will follow it tmorrow. Anyway the fact that I am calling `$em->find(DataCategory::class, $id)` over and over **without seeing it in the memprof**, but my DQL query with `getResult()` being shown makes me wonder. To conclude this support case: A) you are not aware of any memleak in queries and `getResult`'s, right? B) And it should be possible to "travel" the association graph of entities over millions of jumps without leaking memory, too? (of course with intermediate `$em->clear()`'s) C) Both, `$em->find(fqcn, $id)` and `$em->createQuery()->getResult()` are supposed to return entities that are stored automatically in the entityMap before being returned to me? Is there an option to get hydrated but unmanaged entities from the entity manager? (I guess no)
Author
Owner

@nuryagdym commented on GitHub (May 2, 2022):

I guess it is the same problem describe here: https://stackoverflow.com/questions/26616861/memory-leak-when-executing-doctrine-query-in-loop
I am running $em->clear() periodically, and still have the memory leak issue.

so in symfony config/package/doctrine.yaml I have this option:

doctrine:
    dbal:
        default_connection: main
        connections:
            main:
                logging: false

with logging: false doctrine does not log queries into the log file but I guess doctrine is keeping logs somewhere in memory that is why I am having memory leak issue.

So the solution is either

  • run symfony command with --no-debug option
  • disable logging programatically $em->getConnection()->getConfiguration()->getSQLLogger(null);
  • or set profiling: false:
    doctrine:
       dbal:
           default_connection: main
           connections:
               main:
                  profiling: false
    

Otherwise, sql logger Doctrine\DBAL\Logging\DebugStack is keeping all the queries

@nuryagdym commented on GitHub (May 2, 2022): I guess it is the same problem describe here: https://stackoverflow.com/questions/26616861/memory-leak-when-executing-doctrine-query-in-loop I am running $em->clear() periodically, and still have the memory leak issue. so in symfony config/package/doctrine.yaml I have this option: ```yaml doctrine: dbal: default_connection: main connections: main: logging: false ``` with `logging: false` doctrine does not log queries into the log file but I guess doctrine is keeping logs somewhere in memory that is why I am having memory leak issue. So the solution is either - run symfony command with `--no-debug` option - disable logging programatically `$em->getConnection()->getConfiguration()->getSQLLogger(null);` - or set `profiling: false`: ```yaml doctrine: dbal: default_connection: main connections: main: profiling: false ``` Otherwise, sql logger `Doctrine\DBAL\Logging\DebugStack` is keeping all the queries
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#6803