[TorchAO] For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti.

### 🐛 Describe the bug

For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti.



### Versions

script:
https://github.com/intel-innersource/frameworks.ai.pytorch.gpu-models/blob/tongsu/awq/LLM/inference/run_generation.py

cmd:
python -u run_generation.py  -m meta-llama/Llama-3.2-1B --input-tokens 1024 --max-new-tokens 1024 --num-iter 8 --num-warmup 4 --batch-size 1 --inductor  --num-beams 1 --use-hf-code False --use-static-cache --sub-model-name llama3.2-3b --model-save-path /mnt/ssd1/huggingface/hub/AWQ/Llama-3.2-1B-AWQ-INT4_P.pt  --accuracy-only --woq --woq-type awq   --quant-dtype uint4 --group-size 128  --device cuda --token-latency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TorchAO] For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti. #1933

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TorchAO] For the AWQ path, there’s a noticeable gap in LLaMA and Qwen accuracy between BMG and the 4070 Ti. #1933

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions