-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread count keeps growing using ddtrace==2.14.2 #11006
Comments
👋 thank you for the report! Can you confirm the ddtrace version which is causing issues? Also, I can't quite see the top of the dashboard, but it looks like this is a view of Docker Threads--is that correct? Would you be willing to share whether there is also an increase in the total number of processes (PIDs) in the container? This might help us narrow what category of behaviors to look at. |
I'm also observing the same CPU utilization and thrown exception symptoms. The thrown exceptions symptom I'm seeing in my setup is identical-- it's similarly being attributed to I've tried various different Looking for any recommendation or resolution, thanks! |
@micahjz thank you for the report. I received another report about this today; investigating. Just to double-check, when you wrote
Did you mean
Generally, for overhead-related issues our recommendation has been to enable the "stack v2" feature. That said, stack v2 does not yet support exception profiles, so part of the symptom will be obscured by this feature. |
Hi @sanchda . Ah perhaps I had misread the recommendation in the ticket here #9205 to address segmentation faults. I've re-enabled with The performance degradation symptoms are: This is using python 3.12.7 in a testing environment with no traffic. At the :53 minute mark, we switch to use From image top to bottom,
|
@micahjz thank you for checking it out and the comprehensive report. I think there might be something actionable here, but I need to run some of my own experiments to see what I can confirm. Just to verify--do you know whether you're observing the issue where the total thread count is increasing when profiling is enabled? |
I don't believe I'm seeing the total thread count symptom. Thanks for the assistance here @sanchda! |
@micahjz are there any details you can share about the structure of your application? Is it using threads? Tasks? One short-lived thread per request? One short-lived ConcurrentFuture worker per request? A static worker pool? A single process? Any insight will help us investigate. To be clear, generally speaking we're pretty comfortable having a small overhead--like the kind you observe--for CPU metrics. Even an increase in involuntary context switches is something we might be able to explain and admit. I'm very confused by the 2-3ms increase in p90 latency for an idle process. I can't really explain that. I doubt you're running a configuration where you have a CPU quota tuned very carefully to the baseline of an idle process, so it's not like the extra CPU time is time-slicing (e.g., at the level of the Linux scheduler) with the request handler. The v2 stack sampler primarily runs in a separate native thread, so a lot of this overhead does not serialize with the cpython runtime (unlike the "v1" stack sampler, which needed to take the GIL in order to perform a sample). |
The application is a FastAPI / uvicorn web server within a single process. It uses |
That said, the service page on datadog for this service in our testing environment indicates a sum total of ~400 threads for the 3 running pods |
@sanchda I wanted to add this clarification just in case I wasn't clear. In the details and screenshots above, we were already running with profiling enabled. At the 53 minute mark, we change from using |
Appreciate the clarification--thank you--I misunderstood the original message to mean that the overhead was relative to baseline. No, this is very much not our expectation for the stack v2 feature. |
Hello everyone,
we are using ddtrace in production on our celery workers, and those seem to be creating extra threads
we had the issue when we initially upgraded all our unpinned dependencies, so we rolled back and I tested it one by one, the celery version is 5.3.6, and we also use celery-beats, and our worker pool should be multiprocessing.
our python version is 3.11 by the way,
let me know if i can give additional details or if you can’t reproduce,
Regards
The text was updated successfully, but these errors were encountered: