-
Notifications
You must be signed in to change notification settings - Fork 31
multiprocessing.Pool workers hold on to response body #95
Comments
Thank you for your bug report! @bryanmau1 can you take a look? This seems big enough to be a beta blocker. |
FWIW: Something similar to this issue may exist in the standard environment as well. This same service increases from ~70 MB memory up to a "steady state" of ~170 MB over a few hours when we restart it. I tried running it in the flexible environment so I could try to trace if there was an actual leak. I couldn't find anything other than this issue. There is an issue report from ~2 years ago suggesting that others have observed high memory usage for big RPCs: https://code.google.com/p/googleappengine/issues/detail?id=10475 |
I am running into something similar (but with I believe this change should fix the underlying CPython bug: python/cpython@5084ff7 Unfortunately the latest CPython 2.7.13 was released five months prior to this fix. According to the release schedule, 2.7.14 will come in mid 2017. "Unfortunately", GCE's I suppose the easiest option would be to have our Dockerfile to overwrite |
And if I understand things right, that's (Correct me if I'm wrong here!) Gunicorn appears to re-use workers when possible, but the API Call threadpool appears to distribute load randomly across all blocked threads. So in high load situations, you'd quickly get 100*100=10K |
Updating the Python runtimes to cPython 2.7.13 is on my todo list. We can cherry-pick particular patches from upstream. I should note that the python-compat runtime is deprecated, and will stop working on October 26, 2017, once the underlying service bridge is turned down. |
FWIW this bug still manifests even outside But just to follow-up: adding in the fixed Previously I would inch up to 100% memory usage within 24 hours (+0.3GB/hour), and then eventually swap-slowly, crash, and restart. Now it seems to be consistently hovering around 27%. Glad to hear about new cPython! Is it possible to circle-back if/when you pull this fix into a pushed cPython 2.7.13 (or if there's some tracking bug for it), so that I know when to delete my hack-fix? Thanks! |
In our app, we make use of pull task queues, and we pull large number of tasks with large bodies (e.g. ~5 MB of payloads). I noticed that it appeared we had a memory leak when running this in the
python-compat
runtime, with many megabytes of strings being retained. With a whole lot of hacking, I ended up tracking it down to the following:My hack to fix it: in
google/appengine/ext/vmruntime/vmstub.py
, I setresponse._content = None
right after the response protocol buffer message is parsed. This caused our process to use ~100 MB of memory, instead of ~400 MB of memory.Hacky output from my tool that found the large strings that were being retained, showing what is holding on to it:
The text was updated successfully, but these errors were encountered: