[Speculative Decoding] Fixing hidden states handling in batch expansion #7508

abhigoyal1997 · 2024-08-14T07:30:25Z

This PR fixes the handling of hidden_states in batch expansion.

github-actions · 2024-08-14T07:30:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

abhigoyal1997 · 2024-08-14T07:35:48Z

/ready

cadedaniel · 2024-08-14T21:21:35Z

can you add a test?
what exactly does this fix? were the hidden states applied incorrectly?

abhigoyal1997 · 2024-08-15T13:31:50Z

can you add a test?

Would an E2E test be sufficient? I can add the example I mentioned in the issue where we see the error.

what exactly does this fix? were the hidden states applied incorrectly?

There was an error in a specific scenario. Consider sequences for which spec. decode is disabled (e.g., when num_tokens + spec_tokens > max_model_len). For such sequences, proposal length is set to 0 and they are handled separately from other sequences. Then when contracting the batch (in BatchExpansionTop1Scorer._contract_batch), we merge spec and non_spec sequences into shape (batch_size, spec_length + 1). While this was done correctly for other tensors, hidden_states were kept unchanged before and only reshaped later on:

vllm/vllm/spec_decode/spec_decode_worker.py

Line 650 in fc93e56

hidden_states = hidden_states.reshape(-1, max_proposal_len + 1,

This reshaping works fine if all sequences have proposal length > 0, but if for any sequence spec decode is disabled, it doesn't work. This PR changes batch expansion (mainly BatchExpansionTop1Scorer._contract_batch) such that hidden_states are manipulated and arranged like other tensors like tokens, probs and logprobs. This ensures hidden_states are correctly updated in all cases.

cadedaniel · 2024-08-15T23:06:33Z

Thanks for the writeup.

Would an E2E test be sufficient? I can add the example I mentioned in the issue where we see the error.

An E2E test is good. I think the issue here is that we should validate an accuracy level, not just that it doesn't crash. For this you can build on this PR #6454 which adds an assertion that draft acceptance rate is 100%.

abhigoyal1997 · 2024-08-16T04:58:42Z

An E2E test is good. I think the issue here is that we should validate an accuracy level, not just that it doesn't crash. For this you can build on this PR #6454 which adds an assertion that draft acceptance rate is 100%.

Makes sense, but any draft model which uses hidden states can't be identical to the target model. In that case I don't think we can get draft acceptance rate of 100%. What do you suggest then?

cadedaniel · 2024-08-16T05:35:59Z

You can run it for the test prompts and record the accuracy, then assert it doesn’t go below that fixed value plus some epsilon.

abhigoyal1997 · 2024-08-17T06:57:07Z

@cadedaniel I've added the test along with other MLPSpeculator tests.

njhill

Thanks @abhigoyal1997!

njhill · 2024-08-19T22:05:01Z

@cadedaniel you good with this being merged?

cadedaniel

yep, good to merge! thanks for the great testing here

cadedaniel · 2024-08-19T22:13:18Z

tests/spec_decode/e2e/test_mlp_correctness.py

+                                  temperature=0.0,
+                                  seeded=True,
+                                  force_output_len=True,
+                                  expected_acceptance_rate=0.48)


@njhill do you know what value we should expect here?

(like, is this ballpark correct)

@cadedaniel the value looks reasonable but I’m actually not sure what’s expected, will try to find out when I get a chance.

…on (vllm-project#7508)

…on (vllm-project#7508) Signed-off-by: Alvant <[email protected]>

…on (vllm-project#7508)

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 14, 2024

abhigoyal1997 mentioned this pull request Aug 14, 2024

[Speculative Decoding] EAGLE Implementation with Top-1 proposer #6830

Merged

adding acceptance rate test for large output length

99484ae

abhigoyal1997 force-pushed the hidden_states_fix branch from 90bee1d to 99484ae Compare August 16, 2024 09:31

abhigoyal1997 added 3 commits August 16, 2024 15:07

fixing hidden states manipulation for batch expansion

2e51385

print acceptance rate in spec decode tests

d8bcff0

changing expected acceptance rate for test

6954ead

Merge branch 'vllm-project:main' into hidden_states_fix

08b3cd5

njhill self-requested a review August 17, 2024 18:13

njhill approved these changes Aug 18, 2024

View reviewed changes

Merge branch 'vllm-project:main' into hidden_states_fix

5815ccc

cadedaniel approved these changes Aug 19, 2024

View reviewed changes

cadedaniel merged commit 312f761 into vllm-project:main Aug 20, 2024
46 checks passed

zifeitong pushed a commit to zifeitong/vllm that referenced this pull request Aug 20, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansi…

14689f7

…on (vllm-project#7508)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansi…

c047703

…on (vllm-project#7508)

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansi…

d241e8e

…on (vllm-project#7508)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansi…

1a26c97

…on (vllm-project#7508) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansi…

d396c9d

…on (vllm-project#7508)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative Decoding] Fixing hidden states handling in batch expansion #7508

[Speculative Decoding] Fixing hidden states handling in batch expansion #7508

abhigoyal1997 commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

abhigoyal1997 commented Aug 14, 2024

cadedaniel commented Aug 14, 2024

abhigoyal1997 commented Aug 15, 2024 •

edited

Loading

cadedaniel commented Aug 15, 2024

abhigoyal1997 commented Aug 16, 2024 •

edited

Loading

cadedaniel commented Aug 16, 2024

abhigoyal1997 commented Aug 17, 2024 •

edited

Loading

njhill left a comment

njhill commented Aug 19, 2024

cadedaniel left a comment

cadedaniel Aug 19, 2024

cadedaniel Aug 19, 2024

njhill Aug 20, 2024

[Speculative Decoding] Fixing hidden states handling in batch expansion #7508

[Speculative Decoding] Fixing hidden states handling in batch expansion #7508

Conversation

abhigoyal1997 commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

abhigoyal1997 commented Aug 14, 2024

cadedaniel commented Aug 14, 2024

abhigoyal1997 commented Aug 15, 2024 • edited Loading

cadedaniel commented Aug 15, 2024

abhigoyal1997 commented Aug 16, 2024 • edited Loading

cadedaniel commented Aug 16, 2024

abhigoyal1997 commented Aug 17, 2024 • edited Loading

njhill left a comment

Choose a reason for hiding this comment

njhill commented Aug 19, 2024

cadedaniel left a comment

Choose a reason for hiding this comment

cadedaniel Aug 19, 2024

Choose a reason for hiding this comment

cadedaniel Aug 19, 2024

Choose a reason for hiding this comment

njhill Aug 20, 2024

Choose a reason for hiding this comment

abhigoyal1997 commented Aug 15, 2024 •

edited

Loading

abhigoyal1997 commented Aug 16, 2024 •

edited

Loading

abhigoyal1997 commented Aug 17, 2024 •

edited

Loading