-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix weight loading for Chameleon when TP>1 #7410
[Bugfix] Fix weight loading for Chameleon when TP>1 #7410
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
I backported changes in this PR to my fork based on v0.5.4, and it resolves #7388 for me. My fork isn't like a significant deviation from v0.5.4, so I think this fixes it. Thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 Is this something new that _get_logits
can return None
value? Why wasn't this caught previously?
It appears that Chameleon is the first model that actually uses the result of calling Edit: Actually, Phi also uses it, but it already has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM! Thank you for the fix!
Yea I checked again and for most models we return |
Signed-off-by: Alvant <[email protected]>
This PR fixes the inability to run Chameleon (7B and 30B) with tensor parallelism.
row_parallel_weight_loader
(extracted from our code for Command-R model) to weight utils.None
when applying TP logits processor in Chameleon model.Optional[torch.Tensor]
instead oftorch.Tensor
)._logits=None
incompute_logits
for Medusa model.postprocess_inputs
toHfRunner
to convert input dtypes for Chameleon, since its HF processor fails to do so.postprocess_inputs
instead of patching the processor directly.output_ids
fromVllmRunner
causing failures when usingcheck_outputs_equal
.CompletionOutput
(token_ids
is now an array instead of a tuple).FIX #7388