Skip to content

[TorchAO] For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti. #1933

@MingxuZh

Description

@MingxuZh

🐛 Describe the bug

For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti.

Versions

script:
https://github.com/intel-innersource/frameworks.ai.pytorch.gpu-models/blob/tongsu/awq/LLM/inference/run_generation.py

cmd:
python -u run_generation.py -m meta-llama/Llama-3.2-1B --input-tokens 1024 --max-new-tokens 1024 --num-iter 8 --num-warmup 4 --batch-size 1 --inductor --num-beams 1 --use-hf-code False --use-static-cache --sub-model-name llama3.2-3b --model-save-path /mnt/ssd1/huggingface/hub/AWQ/Llama-3.2-1B-AWQ-INT4_P.pt --accuracy-only --woq --woq-type awq --quant-dtype uint4 --group-size 128 --device cuda --token-latency

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions