-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pass ignore_eos parameter to all benchmark_serving calls #9349
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@KuntaiDu @youkaichao Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…t#9349) Signed-off-by: charlifu <[email protected]>
…t#9349) Signed-off-by: Vinay Damodaran <[email protected]>
…t#9349) Signed-off-by: Alvant <[email protected]>
…t#9349) Signed-off-by: Amit Garg <[email protected]>
…t#9349) Signed-off-by: qishuai <[email protected]>
…t#9349) Signed-off-by: Sumit Dubey <[email protected]>
…t#9349) Signed-off-by: Maxime Fournioux <[email protected]>
Currently ignore_eos parameter set by the user isn't passed into the profiler and actual benchmark run sections of benchmark_serving, but only in the first promopt run section. This means the output sequence length of different backends tested with benchmark_serving may be different, since ignore_eos isn't set to true. With this fix all backends with the same dataset should now generate the same number of output tokens per test and the results can be comparable.