Replies: 1 comment
-
GPTQ is weight-quantization. GPTQ-INT8 is 8bit weight + fp16 scales + fp16 zeros (with fp16 activation). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Can you clarify on if the 8-bit GPTQ quantized models use 8-bit activations or 16-bit? Model link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
Beta Was this translation helpful? Give feedback.
All reactions