Skip to content

Is this library much slower than AutoGPTQ? #990

Answered by Qubitium
CHNtentes asked this question in Q&A
Discussion options

You must be logged in to vote

@CHNtentes Memory usage issue has been fixed in https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0

We are now 35% lower vram usage than before and 15% lower than AutoGPTQ and our test quants per layer, using same calibration data, with QwQ 32B test GPTQModel has consistent lower error_loss which is critical for quantization. Speed is about 7.5% slower than AutoGPTQ for QwQ 32B but with lower vram usage and higher quality quants, it is worth the one-off cost. We will try to improve the speed in our next release.

Replies: 4 comments 8 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@CHNtentes
Comment options

@Qubitium
Comment options

@Qubitium
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@CHNtentes
Comment options

@Qubitium
Comment options

@CHNtentes
Comment options

@Qubitium
Comment options

@CHNtentes
Comment options

Answer selected by Qubitium
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants