Skip to content

Releases: ModelCloud/GPTQModel

GPTQModel v0.9.4

04 Jul 05:41
527cffb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.9.3...v0.9.4

GPTQModel v0.9.3

02 Jul 18:05
26b3dc0
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.9.2...v0.9.3

GPTQModel v0.9.2

29 Jun 12:15
6b3923e
Compare
Choose a tag to compare

What's Changed

Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.

New Contributors

Full Changelog: v0.9.1...v0.9.2

GPTQModel v0.9.1

27 Jun 07:30
71ed742
Compare
Choose a tag to compare

What's Changed

v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize() has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.

New Contributors

Full Changelog: v0.9.0...v0.9.1

GPTQModel v0.9.0

20 Jun 17:50
6bf62cf
Compare
Choose a tag to compare

What's Changed (First Release since AutoGPTQ fork)

4 New Models plus sym=False asymmetry and lm_head quantized inference support.

Full Changelog: https://github.com/ModelCloud/GPTQModel/commits/v0.9.0