-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculative Decoding] Fixing hidden states handling in batch expansion #7508
[Speculative Decoding] Fixing hidden states handling in batch expansion #7508
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
/ready |
|
Would an E2E test be sufficient? I can add the example I mentioned in the issue where we see the error.
There was an error in a specific scenario. Consider sequences for which spec. decode is disabled (e.g., when num_tokens + spec_tokens > max_model_len). For such sequences, proposal length is set to 0 and they are handled separately from other sequences. Then when contracting the batch (in vllm/vllm/spec_decode/spec_decode_worker.py Line 650 in fc93e56
This reshaping works fine if all sequences have proposal length > 0, but if for any sequence spec decode is disabled, it doesn't work. This PR changes batch expansion (mainly |
Thanks for the writeup.
An E2E test is good. I think the issue here is that we should validate an accuracy level, not just that it doesn't crash. For this you can build on this PR #6454 which adds an assertion that draft acceptance rate is 100%. |
Makes sense, but any draft model which uses hidden states can't be identical to the target model. In that case I don't think we can get draft acceptance rate of 100%. What do you suggest then? |
You can run it for the test prompts and record the accuracy, then assert it doesn’t go below that fixed value plus some epsilon. |
90bee1d
to
99484ae
Compare
@cadedaniel I've added the test along with other MLPSpeculator tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abhigoyal1997!
@cadedaniel you good with this being merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, good to merge! thanks for the great testing here
temperature=0.0, | ||
seeded=True, | ||
force_output_len=True, | ||
expected_acceptance_rate=0.48) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill do you know what value we should expect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(like, is this ballpark correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cadedaniel the value looks reasonable but I’m actually not sure what’s expected, will try to find out when I get a chance.
…on (vllm-project#7508) Signed-off-by: Alvant <[email protected]>
This PR fixes the handling of hidden_states in batch expansion.
Fix #7505