-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant CPU consumption when agent is on #1177
Comments
I tried to create a repro repository mimicing what is happaning on my actual code and... maybe I miss something, but repro code doesn't create the same issue. I'm kinda tired and bleary-eyed already as I investigate it for 3 days... Could someone from newrelic point me to other places for debugging newrelic somehow? Some extra debug logs that could show, for example very frequent execution of some newrelic functionality? My guess is that newrelic's green thread is ticking very frequently... Disabling newrelic on my prod for now... No idea how to solve |
Let me know if you'd find a repro useful and I can try to produce one, but I'm also seeing this. It is most pronounced in CPU-constrained environments, I didn't notice it locally, but once I deployed to Google Cloud Run it quickly maxed the CPU.
Invoking Gunicorn like:
Starting new relic in my
Hope that helps. A single request will trigger it in the case of Cloud Run. I have New Relic turned off for my service that uses gevent worker model until this is resolved, so would be great if it gets figured out! |
Thanks for the information to help with the reproduction! Looking into this |
Hi there -- Any luck looking into this? Would love to get back to instrumenting our gevent model gunicorn processes again. |
Hey I'm picking this ticket up from Lalleh. That get_pid and epoll info is intriguing. I see there is this patch https://github.com/newrelic/newrelic-python-agent/blob/main/newrelic/hooks/coroutines_gevent.py#L17-L29 and I wonder if this could be causing an issue. Could you try running with this branch and see if this fixes the issue? |
Besides that patch I mentioned which does seem suspicious (I do see calls to sleep in the green threads dump too), I've been trying to setup a repro of this issue and I haven't been able to do so. Along the same lines as maybephilipp was thinking, it looks like it's spending a lot of time waiting for the cpu to become available again (aka other threads). We do have a harvest thread that we create and it does run every 5s to push up certain data to newrelic. If this harvesting is causing issues, you could limit the data that is being harvested and see if that helps (the following configuration settings will turn off harvesting of various data in case the data harvest is so much that it's causing a bottleneck on the threads):
What are the CPU constraints of the system you are running on where you are seeing this issue? Also along those same lines; how long does each request take to complete (aka do you have long running requests >.5s)? What kinds of things does the application do that you are monitoring? Does it talk to sql, do lots of logging, custom event creation, talk to other servers, do anything CPU intensive, memory intensive, network intensive, etc? I wonder if this is just a case where the additional overhead of running our monitoring is pushing your system over the edge especially since you say you are only able to reproduce it on CPU constrained systems. I am able to get the CPU usage up fairly high on a CPU constrained system as well but it doesn't stay there like what you are describing here which is why I'm wondering if there's maybe something specific inside of your application that you are doing that is contributing to this. It would really be helpful to have a reproduction of this if you could post a reproduction repo with a docker container and the CPU constraints you are running with I think it would go a long way to getting to the bottom of this. |
All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue. |
Python agent makes python process to consume a lot of CPU in idle state.
Description
When I turn python agent on, upon server start, when first request came and some queries ran, python process starts to eat 40-100% of CPU. If I disable python agent and restart server, the CPU is 0-3%. The high CPU level with python agent keeps the same while being idle (100% sure no requests coming).
Stack: Python, Flask, Gevent (tried both – just PyWSGI server and gunicorn – the situation is exactly same), SocketIO (with simple-websocket), Nginx.
Debugging the app showed that most of CPU time is spent looping green threads. strace showed that a lot of epoll_wait's going on when newrelic is up.
Expected Behavior
No impact or low impact.
Troubleshooting or NR Diag results
NR Diag (nrdiag/linux/nrdiag_arm64 --suites python):
Let me know how can I share the nrdiag-output.zip as I assume there is some sensitive info in it.
py-spy
(hub.py just keeps growing after time)
austin-tui
main process:
any other thread:
cProfile snakeviz
The most interesting part: strace
After a lot of time of investigation I found a stackoverflow question (https://stackoverflow.com/questions/58973698/uwsgi-with-gevent-takes-up-100-cpu) which led me to idea of monitoring sys calls and turned out something in my python app just annihilating my server with epoll_wait and getting pid:
Green threads dump:
https://pastebin.com/mVrqs4VY
Done with:
Steps to Reproduce
will try to create
Your Environment
Additional context
nothing
The text was updated successfully, but these errors were encountered: