Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #48

Open
maciejfikus opened this issue Jul 7, 2022 · 7 comments
Open

Memory leak #48

maciejfikus opened this issue Jul 7, 2022 · 7 comments

Comments

@maciejfikus
Copy link

Hi,

I haven't been able to find the exact reason by now, however it seems that we've got a serious memory leak somewhere around.

While using this library together with ReactPHP webserver (using FrameworkX), each single query to DB takes some memory which GC can't free after all. FrameworkX is implementation detail here.

Steps to reproduce:

  1. Make clean installation of FrameworkX (or whatever what uses ReactPHP webserver)
  2. For each web request, resolve Client from DI and then execute such code:
$this->client
        ->query('SELECT * FROM search_history LIMIT 1;')
        ->subscribe(new CallbackObserver(
            function($row) {},
            fn($e) => throw $e,
            function() {}
        ));

There is not even need to fetch and return data.

  1. Use AB or WRK or anything else to spam some requests, i.e. 10k. Actually, you can probably do the same with simple for each(range(1,10000)) and executing query inside the loop. Used memory will never be freed.

Does anyone have any idea where it comes from or how to prevent it, even temporarily?

@maciejfikus
Copy link
Author

I've created simple PoC to make verification easier.

https://github.com/maciejfikus/pgasync-memory-leak-poc

Just run it with docker-compose up -d.
The application will initialize the client pgsql, and then loop through 30,000 SELECT 1 queries; No other dependencies. By checking docker stats you will soon notice that memory goes somewhere around 250MB and stay like that forever. Even after all queries will be proceeded, and real usage will drop to few MBs, reserved memory will never be freed.

Any ideas?

@clue
Copy link

clue commented Aug 22, 2022

@maciejfikus Thanks for providing this test script. I've taken a look at can see some problematic behavior, but have yet to find a real memory leak here. Perhaps anybody else can also take a look at share their findings?

Here's the gist of what your example is doing with some comments added by me:

for($i = 1; $i <= 30000; $i++) {
    $client->query('SELECT 1')->subscribe(…);
}

sleep(30);

var_dump(memory());
  • The Client class will automatically create a limited number of Connections (5 by default) when sending queries, so this would end up creating 5 connections taking care of all queries.
  • Calling $client->query() will queue 30k outgoing requests which in itself is a quite expensive operation to set up all event listeners etc.. Note that this method is async, so it will not wait for the query results here, it will only register the handlers to fire at a later time when the results actually come in.
  • The sleep() function is blocking any must not be used in a truly non-blocking application. This will halt the entire program execution and as such will not execute any queries or process any data for 30 seconds. Perhaps you want to use timers or React\Async\await(React\Promise\Timer\sleep(30)) instead?
  • It looks like this library keeps connections open indefinitely, so your example will keep running until the last connection times out on the server side. FWIW, there's a deprecated $client->closeNow() method that might help?
  • So far, we have yet to spot any real memory leaks. At the moment, we're merely observing the Client has to keep outstanding queries in its memory.
  • I've checked variations of this program and can confirm that it seems to clear outgoing queries from $connection->commandQueue just fine when they're completed.

Any other input? What else could be causing memory leaks here?

@maciejfikus
Copy link
Author

@clue, I am aware that this code is as naive as it could possibly be, but it does the job. So far, your analysis is perfectly correct.

We can actually skip Client class by using single Connection class directly, the issue will preserve. I agree that sleep function shouldn't be here, used it purely for a debugging purpose. On the other hand, usage of sleep() function is irrelevant to the problem.

It looks like this library keeps connections open indefinitely, so your example will keep running until the last connection times out on the server side. FWIW, there's a deprecated $client->closeNow() method that might help?

It is true and it has its flaws. Regarding closeNow(), I am pretty sure I already tried that together with other tricks like unsetting while $client variable. I did that in a little more complex example (FrameworkX + container) so maybe I missed some reference, but I am almost willing to bet it will behave the same in current context. Although, I will test that tomorrow anyway.

I've checked variations of this program and can confirm that it seems to clear outgoing queries from $connection->commandQueue just fine when they're completed.

Yup, commandQueue is cleared after all queries passes. Memory usage drops with each finished query, but reserved memory stay unchanged.

Assuming that I didn't make any huge mistake while testing it, my only suspicion right that open streams somehow keep memory high. It may be due to the fact I've no idea about stream's inner working tho.

Aside from the unknown cause, could you confirm that you noticed a memory leak in your variations of this program?

@mbonneau
Copy link
Member

I have not yet had a chance to look at this.

I use this library pretty extensively in long-running processes and have not noticed any memory leaks.

I will run the example @maciejfikus created when I get some time and report back.

@mbonneau
Copy link
Member

@maciejfikus In your example project, the queries are async. The loop does not start until after the var_dump. So what is happening is you are creating the 30,000 queries - which get buffered to fire off once the connections are up and running.

Also, I don't think it was ever connecting to the database - that is another issue that would need to be fixed: reporting that the database connection failed.

Here is what I used to run the queries and show the memory usage:

$client = new PgAsync\Connection([
    'host' => 'db',
    'port' => 5432,
    'user' => 'devuser',
    'password' => 'devsecret',
    'database' => 'devdb'
], \EventLoop\getLoop());

for($i = 1; $i <= 60000; $i++) {
    $queries[] = $client->query('SELECT 1');
}

\Rx\Observable::fromArray($queries)
    ->mergeAll()
    ->subscribe(
        fn($result) => 5,
        fn(\Throwable $result) => printf("Error: %s\n", $result->getMessage()),
        fn() => printf("Done.\n")
    );

\Rx\Observable::interval(1000)
    ->do(fn ($i) => printf("%d: %s\n", $i, json_encode(memory())))
    ->take(30)
    ->subscribe();

When using the Client it does appear that there could be an issue as it uses a lot more memory than a single connection. This is something that needs further inspection also.

Let me know if this answers the question about memory leaks.

Thanks for posting a great issue - makes it much easier to help.

@maciejfikus
Copy link
Author

@mbonneau, thanks for taking a look. Indeed, my example wasn't too fortunate, I intended to keep it as simple as possible and make a few mistakes while building that example gist. I've updated my example https://github.com/maciejfikus/pgasync-memory-leak-poc/blob/master/public/index.php

Now what we get is just a single connection, loop, var_dump on each success to make sure that it comes from DB and periodic var_dump of memory() together with backlog count. Try this one :-)

Here is my output

array(1) {
  ["?column?"]=>
  string(1) "1"
}
array(1) {
  ["?column?"]=>
  string(1) "1"
}
array(1) {
  ["?column?"]=>
  string(1) "1"
}
int(0)    // <--- backlog length
array(3) {
  [0]=>
  string(22) "Used Memory : 23.04 MB"
  [1]=>
  string(25) "Used Real Memory : 148 MB"
  [2]=>
  string(30) "Used Real Peak Memory : 148 MB"
}

After few minutes memory output is still the same. Now take a look into docker stats:

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
6eee45c448a2   memory_leak_app_1   0.00%     151.1MiB / 15.29GiB   0.96%     3.98MB / 2.41MB   0B / 0B          1

@maciejfikus
Copy link
Author

@mbonneau, @clue, with above in mind – could you confirm that this is not only my imagination :-)?
Any clue in what direction should I dig?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants