[Usage]: Obtaining success / error rate % metrics #9346

yqlu · 2024-10-14T17:59:02Z

Your current environment

Running vLLM v0.5.1 on GKE, but my exact setup isn't relevant to the question below

How would you like to use vllm

I see that in vllm/engine/metrics.py there is a success count metric, split up by success reason (anecdotally for me length and stop).

Is it currently possible to get a success rate % metric by dividing this by a denominator? What denominator should I use here -- maybe num_requests_running or time_to_first_token_seconds_count? I tried both, but they didn't seem to provide the right result (in that the ratio could potentially momentarily go above 100).

If there was a error count metric, I could graph success % as the fraction success / (success + error). Is that in the works? I see a roadmap from April which states that request_failure is a planned addition, but wasn't sure if this is up to date. Thanks!

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

mces89 · 2024-11-02T01:57:13Z

this metric is quite important for production monitoring, hope to see it soon.

mces89 · 2024-11-20T23:46:37Z

@yqlu do you find any solution for this issue? Thanks.

yqlu · 2024-12-09T17:24:14Z

No, I couldn't find any workaround or way to derive this with the existing metrics.

github-actions · 2025-03-10T01:51:39Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2025-04-09T02:06:10Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

achandrasekar · 2025-05-02T18:20:56Z

@yqlu looks like http metrics is an option here to look for non 2xx status codes - https://docs.vllm.ai/en/latest/design/v1/metrics.html#prometheus-client-library.

yqlu added the usage How to use vllm label Oct 14, 2024

github-actions bot added the stale Over 90 days of inactivity label Mar 10, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2025

harche mentioned this issue May 27, 2025

[WIP] Add a metric to track request failures #18765

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Obtaining success / error rate % metrics #9346

[Usage]: Obtaining success / error rate % metrics #9346

yqlu commented Oct 14, 2024

mces89 commented Nov 2, 2024

Uh oh!

mces89 commented Nov 20, 2024

Uh oh!

yqlu commented Dec 9, 2024

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

achandrasekar commented May 2, 2025

Uh oh!

Uh oh!

[Usage]: Obtaining success / error rate % metrics #9346

[Usage]: Obtaining success / error rate % metrics #9346

Comments

yqlu commented Oct 14, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

mces89 commented Nov 2, 2024

Uh oh!

mces89 commented Nov 20, 2024

Uh oh!

yqlu commented Dec 9, 2024

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

achandrasekar commented May 2, 2025

Uh oh!