-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix MiniCPMV and Mllama BNB bug #9917
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Now, after fixing this, I can generate reasonable results using my local image |
Hmm, It seems that BNB has issues handling weights in |
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
c0264e6
to
a12c16c
Compare
I have handled this issue, @mgoin please review it again,thanks~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me thanks, just curious about two things
@@ -1005,16 +1007,21 @@ def _unquantized_generator(self, hf_weights_files, use_safetensors, | |||
if any(target_module in weight_name for target_module in | |||
self.target_modules) and weight_name.endswith(".weight"): | |||
weight_name = weight_name.replace(".weight", ".qweight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general questions I have about BNB is why do we use .qweight
in the BNBLinearMethod but the model checkpoints actually use .weight
? It seems we could avoid some logic by having the quant method directly use .weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's to maintain consistency with other quantization algorithms like GPTQ implementations see:
layer.register_parameter("qweight", qweight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's try to remove this in the future if possible! The "consistency" is just coincidental - we usually aim for parameters to have the same name as in the checkpoint format to make weight loading simple
Signed-off-by: Jee Jee Li <[email protected]>
efe89d8
to
e410d9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too!
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Richard Liu <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
FILL IN THE PR DESCRIPTION HERE
FIX #9914
ping @mgoin
cc @chenqianfzh as well. Regarding multimodal models, additional BNB implementation logic may be required