-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Usage]: Obtaining success / error rate % metrics #9346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this metric is quite important for production monitoring, hope to see it soon. |
@yqlu do you find any solution for this issue? Thanks. |
No, I couldn't find any workaround or way to derive this with the existing metrics. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
@yqlu looks like http metrics is an option here to look for non 2xx status codes - https://docs.vllm.ai/en/latest/design/v1/metrics.html#prometheus-client-library. |
Your current environment
Running vLLM v0.5.1 on GKE, but my exact setup isn't relevant to the question below
How would you like to use vllm
I see that in vllm/engine/metrics.py there is a success count metric, split up by success reason (anecdotally for me
length
andstop
).Is it currently possible to get a success rate % metric by dividing this by a denominator? What denominator should I use here -- maybe
num_requests_running
ortime_to_first_token_seconds_count
? I tried both, but they didn't seem to provide the right result (in that the ratio could potentially momentarily go above 100).If there was a error count metric, I could graph success % as the fraction success / (success + error). Is that in the works? I see a roadmap from April which states that
request_failure
is a planned addition, but wasn't sure if this is up to date. Thanks!Before submitting a new issue...
The text was updated successfully, but these errors were encountered: