-
I tried quantizing the same model with same quantize config, with AutoGPTQ I can finish in 15~20 minutes, but with this library I need over 2 hours... |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 8 replies
-
Please use and increase auto-gptq has broken batch support for calibration. gptqmodel has batching support but you need to set to a proper value accordingly to your gpu capability and vram size. |
Beta Was this translation helpful? Give feedback.
-
@CHNtentes Was the speed issue resolved by |
Beta Was this translation helpful? Give feedback.
-
@CHNtentes In addition to Quantization time is a bit slower than AutoGPTQ at the moment but we hope to fix that in our next release. More importanly, GPTQModel's |
Beta Was this translation helpful? Give feedback.
-
@CHNtentes Memory usage issue has been fixed in https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0 We are now 35% lower vram usage than before and 15% lower than AutoGPTQ and our test quants per layer, using same calibration data, with QwQ 32B test GPTQModel has consistent lower |
Beta Was this translation helpful? Give feedback.
@CHNtentes Memory usage issue has been fixed in https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0
We are now 35% lower vram usage than before and 15% lower than AutoGPTQ and our test quants per layer, using same calibration data, with QwQ 32B test GPTQModel has consistent lower
error_loss
which is critical for quantization. Speed is about 7.5% slower than AutoGPTQ for QwQ 32B but with lower vram usage and higher quality quants, it is worth the one-off cost. We will try to improve the speed in our next release.