Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AutoAWQ marlin methods error #7517

Open
MichoChan opened this issue Aug 14, 2024 · 9 comments
Open

[Bug]: AutoAWQ marlin methods error #7517

MichoChan opened this issue Aug 14, 2024 · 9 comments
Labels
bug Something isn't working stale

Comments

@MichoChan
Copy link

MichoChan commented Aug 14, 2024

Your current environment

vllm 0.5.4

🐛 Describe the bug

autoawq marlin must with no zero point, but vllm:

def query_marlin_supported_quant_types(has_zp: bool,
                                       min_capability: Optional[int] = None):
    if min_capability is None:
        major, minor = current_platform.get_device_capability()
        min_capability = major * 10 + minor

    if min_capability < 80:
        return []

    if has_zp:
        # AWQ style, unsigned + runtime zero-point
        return [scalar_types.uint4, scalar_types.uint8]
    else:
        # GPTQ style, unsigned + symmetric bias
        # TODO: once fp8_marlin is merged into "gptq_marlin" we should be able
        #  to add `scalar_types.float8_e4m3fn` here
        return [scalar_types.uint4b8, scalar_types.uint8b128]`

this would error### ###

@MichoChan MichoChan added the bug Something isn't working label Aug 14, 2024
@mgoin
Copy link
Member

mgoin commented Aug 14, 2024

@MichoChan could you please share a command for triggering this error so we can reproduce? Is this some model that didn't work for you?

@robertgshaw2-redhat
Copy link
Collaborator

@MichoChan I believe this issue is fixed on current main by #7264

@MichoChan
Copy link
Author

@MichoChan I believe this issue is fixed on current main by #7264

i know, when i use autoawq with zero point true, gemm version, the vllm will convert awq gemm version to awq marlin verison, that looks like fine, but when i quant with autoawq using marlin and no zero point, the vllm will raise error, because vllm only supoort awq marlin with zero point

@robertgshaw2-redhat
Copy link
Collaborator

Can you point me to a model checkpoint without zero point?

@MichoChan
Copy link
Author

Can you point me to a model checkpoint without zero point?

sorry, i have no model checkpoint without zero point that you can get from hub/public site

and i notice that Autoawq quant with marlin which already using marlin format to save model, however, vllm only support a normal awq format, and then auto convert it to marlin format and using marlin kernel.

so can i say that vllm only support a normal awq format and can convert to marlin format when runtime?

@ColumbusAI
Copy link

ColumbusAI commented Aug 15, 2024

+1 here I've been trying to get this going. First here is my quantize.py file for autoawq:

model_path = '/mnt/g/stable-code-instruct-3b'
quant_path = '/home/admin/stable_code_marlin'

quant_config = {
"zero_point": False, # To use Marlin, you must specify zero point as False and version as Marlin.

The comment is taken directly from AutoAWQ here link

so that's how I'm quantizing the model. then when I call vllm from the CLI like so "vllm serve . --port 9000 --trust-remote-code --quantization awq_marlin --cpu-offload-gb 50 --device auto"

It terminates with this error:
"ValueError: Marlin does not support weight_bits = uint4. Only types = [ScalarType.uint4b8, ScalarType.uint8b128] are supported (for group_size = 128, device_capability = 89, zp = False)."

also in order to get this far, I had to manually change the config.json file. Autoawq generates the config.json like this "quant_method": "awq" yet VLLM is expecting "quant_method": "marlin".

In the end you have to manually change to "awq_marlin". Can VLLM code be updated to expect "awq" as the quant method and "marlin" as the version?

this is what the config looks like from AutoAWQ:

"quant_method": "awq",
"version": "marlin",

@liangzelang
Copy link

+1 I also got this error. I use SGLang to launch awq_marlin quantized model, but got error, detail case: sgl-project/sglang#1792.
Through code analysis, it's found that vllm does not support awq_marlin quantized models with zero_point = false.

@liangzelang
Copy link

liangzelang commented Oct 26, 2024

Can you point me to a model checkpoint without zero point?

There are some models if seaching 'awq-marlin' in hf hub; Such as

Also, you can quantize any model to awq_marlin format by Autoawq to reproduce this error.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

5 participants