-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Add max_tokens prometheus metric #9881
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
…kens requested by the user, which is generally different than the number of tokens actually generated Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
f842335
to
9267cd0
Compare
@robertgshaw2-neuralmagic can you review this? |
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Richard Liu <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
This PR adds reporting of
max_tokens
as a prometheus metric (histogram) -vllm:request_params_max_tokens
.The number of generated tokens is already reported (
vllm:request_generation_tokens
), but that's the number of tokens actually generated, which is generally different from the number of tokens requested by the user -max_tokens