-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cpu when using Gunicorn with multiprocess #568
Comments
That's a new idea, which could avoid the problems of #518. Do you want to try it out and let me know how it goes? Keep in mind that that this library supports all forms of multi-process deployments, not just gunicorn. |
Not exactly a new idea... I discussed using UUIDs back in #421 (comment) :) |
Any updates on this issue? We still have this issue and we need a periodical reboot for our instances |
Seeing the same. |
Seeing similar issue. |
function for Gunicorn: child_exit. Did it not work? |
Yes now that you have mentioned it this part should do it: from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid) |
had the same problem, multiprocess.mark_process_dead() only clean "gauge" files, not the others. |
In order to be able to clean-up the process files once it's stopped or destroyed, we need to identify those files with an UUID. If not, those files are going to be processed and collected and will keep growing until we restart the server or master process to perform the cleanup of the metrics directory This change uses the psutil library that gives a hash based on PID+Process creation time in order to differentiate two processes with the same PID but that are different This change fixes prometheus#568 Signed-off-by: Mario de Frutos <[email protected]>
FWIW solved a similar looking problem here where CPU and transmit bandwidth were both ramping up over time (until either the pods were recreated afresh or the deployment would fail) on a web server (with multiprocess Gunicorn and with child_exit configured) because the Prometheus flask exporter was mistakenly left to default to labelling metrics by path (but the number of unique paths for that service was nearly unbounded) and recurring Prometheus scrapes collecting hundreds of thousands of metrics was too demanding. |
Any update on this? this issue is blocking us from using prometheus and forcing us to consider something else. |
Hi @agsha, this sounds like a very common misconfiguration of your individual project, rather than a problem with prometheus (or this python client). This is very easy to confirm or rule out: just go to the metrics endpoint of your app, and perform a line count. If, especially after the app has been operating in production for a while, the line count is large (say >200 lines) then this is the problem. All of the documentation for prometheus (and the python client here) clearly states that you are responsible for tightly limiting the cardinality of the metrics and labels. Often an application is configured to export too many separate metrics (in other words, the app is misusing prometheus by exporting using too many distinct string-values and combinations of labels, resulting in too many separate measurements which prometheus has to try to keep separate time-series of each), commonly because of confusion between the conceptual models of metrics vs logs (unfamiliar users inadvertently attempt to attach too fine grained metadata to their metrics, causing a cardinality explosion). See here for another recent example of the trivial one-line configuration fix, assuming you're using the prometheus flask exporter. (There's a more detailed analysis in the link I posted previously in this thread.) If you're not using prometheus flask exporter, but instead using this python client directly, then the details of the fix for your project will depend on your code. (Do you have a link to share, to the code and the output from the metrics endpoint?) To resolve the repo issue here, I propose that every hundredth increment of the total count of metrics, this client should emit a warning message that includes the current tally and an explanation of the potential hazard. Also, the docs could be updated to include an even more prominent admonition (expounding on this concern of the intended usage) than what the docs currently include. Also, a downstream linked issue should be opened in prometheus flask exporter, to make coarser grouping the default behaviour. Also, this issue could be tagged not a bug, as it is the desired behaviour that this instrumentation attempts to keep reporting every separate metric that has previously been encountered. (That's the purpose, for the scraper to be able to convert each known metric into a dense time series, so that even slow or irregular fluctuations in a time series can be monitored and resolved clearly without gaps in the collected data. It isn't this library's fault if a downstream project tries to autogenerate arbitrarily many new label values or new metric names, causing the set of known metrics to grow unmanageably large. And even if this client were modified, to remember how long ago each value last changed and drop ones that vary infrequently, the central prometheus database would still get overloaded because it still has to separately store each separately-labelled time series that has ever been encountered. Any cardinality explosion will eventually cause performance failures at some level of the user's infrastructure.) OTOH, if anyone can give a reproducible example where the line count of the exported metrics endpoint remains small even though the resource usage associated with |
I took another look at this in more detail. TL;DR:
The code for this client seemed strange to me, as if someone started to implement a central shared memory structure for inter-process communication, but left it unfinished. So instead each separate process writes its state to a separate file (or to multiple files if using multiple types of metric) but using a bespoke binary format. On the plus side it makes it possible to externally interrogate the running state and alleviates most need for synchronisation (although synchronisation appears already implemented).. Here I prepared for a simple test of this prometheus python client (with just a counter metric) without using prometheus flask exporter: import os, random, multiprocessing
os.environ["PROMETHEUS_MULTIPROC_DIR"] = os.getcwd()
import prometheus_client.multiprocess
registry = prometheus_client.CollectorRegistry()
prometheus_client.multiprocess.MultiProcessCollector(registry)
metric = prometheus_client.Counter("My_Counter", "An example measurement with label", ["Some_Field"]) Then I simulated a large number of separate worker processes while still limiting the number of distinct labels to be small: def low_cardinality_worker():
metric.labels(str(random.randint(0, 5))).inc() # only half a dozen possible distinct metrics
def spawn_one_worker_process():
subprocess = multiprocessing.Process(target=low_cardinality_worker)
subprocess.start()
subprocess.join() # wait for process to exit
return subprocess.pid
pids = [spawn_one_worker_process() for i in range(100)] # test having many worker processes This hundred separate processes has generated exactly a hundred separate >>> print(prometheus_client.generate_latest(registry).decode('utf-8'))
# HELP My_Counter_total An example measurement with label
# TYPE My_Counter_total counter
My_Counter_total{Some_Field="0"} 18.0
My_Counter_total{Some_Field="3"} 14.0
My_Counter_total{Some_Field="2"} 15.0
My_Counter_total{Some_Field="1"} 19.0
My_Counter_total{Some_Field="5"} 16.0
My_Counter_total{Some_Field="4"} 18.0 We can also inspect the raw contents of any of these >>> print(list(prometheus_client.mmap_dict.MmapedDict.read_all_values_from_file(f'counter_{str(pid)}.db')))
[('["My_Counter", "My_Counter_total", {"Some_Field": "2"}, "An example measurement with label"]', 1.0, 0.0, 112)] Note that even if the counter gets incremented thousands of times, the length of the for i in range(10000):
metric.labels("foo").inc() # low cardinality test However, if we allow many different label strings then we have the cardinality explosion. The following example produces a 1MB for i in range(10000):
metric.labels(str(random.randint(0, 1e6))).inc() # cardinality explosion If we run this loop ten times longer then the Next, note that running Finally, I performed some simple timing. In the first test, with over a hundred In other words, if the prometheus client is using too much CPU time, the problem is most likely that your application is misconfigured to track too many separately-labelled metrics. Having dozens of |
https://prometheus.github.io/client_python/multiprocess/ I am not sure, if this is the end of the story or if we have a workaround to get the CPU-MEM metrics. |
I've had this issue for a while since I'm trying to monitor a django project which uses gunicorn, where the number of files in the multiproc mode keeps increasing and it might cause some issues. I've read most of the proposed solutions that I've found and I came around using the idea of using a set of limited and reusable uuids to prevent the number of files created from growing. I've tested this for my metrics which only include Counters and Histograms and it seems to be working but since I'm not even remotely fluent in the prometheus_client's and gunicorn's source code I wanted to see if there might be any issues with my solution. It's a very naive and tailored to my project's specifics but I'd be happy if I can use this instead of putting a lot of time and effort to come up with a better but more complicated solution. The idea is to use gunicorn hooks to have a pool of uuids and set a uuid for each worker from that pool then use os.environ to expose the uuid to prometheus client and add the uuid back to the pool when the worker exists. It works okay with my limited amount of testings and only with histograms and counters(I haven't tested gauges and summaries yet) but I'd appreciate it if anyone can point out any flaws or scenarios where something might go wrong. I have this in my gunicorn config file:
and then this piece of code in my prometheus settings:
|
I run flask app on Gunicorn 20.0.4 and Kubernetes. My python is v3.7.5. I have a problem with increasing CPU over a time.
After investigating stack traces using py-spy I noticed that the issue is caused by prometheus metrics lib.
It doesn't clean up metric files from old workers. Therefore over a time merge process take more and more CPU.
After deleting following files the CPU usages dropped significantly:
The issue is related to #275
Can we avoid that somehow without periodically restating the k8s pod? Maybe multiprocess should use PID + UUID generated on worker rather than just PID for file names? So master could remove/merge files from dead workes?
The text was updated successfully, but these errors were encountered: