You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So this can be a very short period. Prometheus scrape intervals are usually 10 seconds or longer, so data is almost certainly lost:
Let's say a workload uses 100% of the GPU for one second, then sleeps one second - the GPU is 50% busy on average. We don't know exactly when Prometheus will scrape, but there's a good chance it would only see 100% or 0% every time it does, so the recorded utilization will probably be incorrect.
Instead it would be better to have a ..._seconds_total counter, like it's done for CPU utilization: https://www.robustperception.io/understanding-machine-cpu-usage
This way we wouldn't lose data due to long Prometheus sample periods, but it would probably require some more work in the exporter (poll data at a higher frequency).
The text was updated successfully, but these errors were encountered:
dutyCycle
is the GPU utilization during the last "sample period" of the driver, according to NVIDIA docs:https://docs.nvidia.com/deploy/nvml-api/structnvmlUtilization__t.html#structnvmlUtilization__t
So this can be a very short period. Prometheus scrape intervals are usually 10 seconds or longer, so data is almost certainly lost:
Let's say a workload uses 100% of the GPU for one second, then sleeps one second - the GPU is 50% busy on average. We don't know exactly when Prometheus will scrape, but there's a good chance it would only see 100% or 0% every time it does, so the recorded utilization will probably be incorrect.
Instead it would be better to have a
..._seconds_total
counter, like it's done for CPU utilization: https://www.robustperception.io/understanding-machine-cpu-usageThis way we wouldn't lose data due to long Prometheus sample periods, but it would probably require some more work in the exporter (poll data at a higher frequency).
The text was updated successfully, but these errors were encountered: