-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: AutoAWQ marlin methods error #7517
Comments
@MichoChan could you please share a command for triggering this error so we can reproduce? Is this some model that didn't work for you? |
@MichoChan I believe this issue is fixed on current main by #7264 |
i know, when i use autoawq with zero point true, gemm version, the vllm will convert awq gemm version to awq marlin verison, that looks like fine, but when i quant with autoawq using marlin and no zero point, the vllm will raise error, because vllm only supoort awq marlin with zero point |
Can you point me to a model checkpoint without zero point? |
sorry, i have no model checkpoint without zero point that you can get from hub/public site and i notice that Autoawq quant with marlin which already using marlin format to save model, however, vllm only support a normal awq format, and then auto convert it to marlin format and using marlin kernel. so can i say that vllm only support a normal awq format and can convert to marlin format when runtime? |
+1 here I've been trying to get this going. First here is my quantize.py file for autoawq: model_path = '/mnt/g/stable-code-instruct-3b' quant_config = { The comment is taken directly from AutoAWQ here link so that's how I'm quantizing the model. then when I call vllm from the CLI like so "vllm serve . --port 9000 --trust-remote-code --quantization awq_marlin --cpu-offload-gb 50 --device auto" It terminates with this error: also in order to get this far, I had to manually change the config.json file. Autoawq generates the config.json like this "quant_method": "awq" yet VLLM is expecting "quant_method": "marlin". In the end you have to manually change to "awq_marlin". Can VLLM code be updated to expect "awq" as the quant method and "marlin" as the version? this is what the config looks like from AutoAWQ: "quant_method": "awq", |
+1 I also got this error. I use SGLang to launch awq_marlin quantized model, but got error, detail case: sgl-project/sglang#1792. |
There are some models if seaching 'awq-marlin' in hf hub; Such as
Also, you can quantize any model to awq_marlin format by Autoawq to reproduce this error. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
vllm 0.5.4
🐛 Describe the bug
autoawq marlin must with no zero point, but vllm:
this would error### ###
The text was updated successfully, but these errors were encountered: