[Bugfix] Bandaid fix for speculative decoding tests #9327
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #6484, the attention selection was changed to allow for attention-free models like Mamba to have a placeholder attention backend, as Mamba still needs attention metadata to manage its internal state.
The change was to call
get_attn_backend
even ifnum_attn_heads
was zero. This is causing a divide-by-zero duringself.model_config.get_head_size()
. This PR skipsget_attn_backend
whennum_attn_heads
is zero, unlessself.model_config.is_attention_free
is set, which is OK is_attention_free is only True for Mamba at the moment.This is a bandaid fix to get the build green, as this is a little hacky.