Skip to content

[Usage]: Obtaining success / error rate % metrics #9346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
yqlu opened this issue Oct 14, 2024 · 6 comments · May be fixed by #18765
Closed
1 task done

[Usage]: Obtaining success / error rate % metrics #9346

yqlu opened this issue Oct 14, 2024 · 6 comments · May be fixed by #18765
Labels
stale Over 90 days of inactivity usage How to use vllm

Comments

@yqlu
Copy link

yqlu commented Oct 14, 2024

Your current environment

Running vLLM v0.5.1 on GKE, but my exact setup isn't relevant to the question below

How would you like to use vllm

I see that in vllm/engine/metrics.py there is a success count metric, split up by success reason (anecdotally for me length and stop).

Is it currently possible to get a success rate % metric by dividing this by a denominator? What denominator should I use here -- maybe num_requests_running or time_to_first_token_seconds_count? I tried both, but they didn't seem to provide the right result (in that the ratio could potentially momentarily go above 100).

If there was a error count metric, I could graph success % as the fraction success / (success + error). Is that in the works? I see a roadmap from April which states that request_failure is a planned addition, but wasn't sure if this is up to date. Thanks!

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@yqlu yqlu added the usage How to use vllm label Oct 14, 2024
@mces89
Copy link

mces89 commented Nov 2, 2024

this metric is quite important for production monitoring, hope to see it soon.

@mces89
Copy link

mces89 commented Nov 20, 2024

@yqlu do you find any solution for this issue? Thanks.

@yqlu
Copy link
Author

yqlu commented Dec 9, 2024

No, I couldn't find any workaround or way to derive this with the existing metrics.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Mar 10, 2025
Copy link

github-actions bot commented Apr 9, 2025

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2025
@achandrasekar
Copy link

@yqlu looks like http metrics is an option here to look for non 2xx status codes - https://docs.vllm.ai/en/latest/design/v1/metrics.html#prometheus-client-library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Over 90 days of inactivity usage How to use vllm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants