Releases · ModelCloud/GPTQModel

04 Jul 05:41

Qubitium

v0.9.4

527cffb

GPTQModel v0.9.4

What's Changed

🚀 [FEATURE] Added Transformers Integration via monkeypatch by @ZX-ModelCloud in #147
👾 [FIX] Typo causing Gemma 2 errors by @LRL-ModelCloud in #158

Full Changelog: v0.9.3...v0.9.4

Contributors

ZX-ModelCloud and LRL-ModelCloud

Assets 2

02 Jul 18:05

Qubitium

v0.9.3

26b3dc0

GPTQModel v0.9.3

What's Changed

🚀 [MODEL] Add Gemma 2 support by @LRL-ModelCloud in #131
🚀 [OTHER] Calculate ppl on gpu by @ZYC-ModelCloud in #135
✨ [REFRACTOR] BaseQuantLinear and avoid using shared QuantLinear cls name by @PZS-ModelCloud in #116
✨ [KERNEL] Bitblas cache stablity by @Qubitium in #129
👾 [FIX] Export TORCH_CUDA_ARCH_LIST in install.sh by @LeiWang1999 in #133
👾 [FIX] Limit Bitblas numexpr thread usage by @Qubitium in #125
👾 [FIX] Revert "Skip opt fc1/fc2 for quantization" due to inference regressions (#118)" by @Qubitium in #149
✨ [REFRACTOR] remove max_memory arg by @CL-ModelCloud in #144
🤖 [CI] Fix test was skipped by @CSY-ModelCloud in #145
🤖 [CI] Add GPU selector for runner by @CSY-ModelCloud in #148

New Contributors

@LeiWang1999 made their first contribution in #133

Full Changelog: v0.9.2...v0.9.3

Contributors

Qubitium, LeiWang1999, and 5 other contributors

Assets 2

29 Jun 12:15

Qubitium

v0.9.2

6b3923e

GPTQModel v0.9.2

What's Changed

Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.

✨ [FEATURE/FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin by @Qubitium @LRL-ModelCloud in #98
✨ [REFRACTOR] remove use_cuda_fp16 argument by @ZX-ModelCloud in #97
✨ [REFRACTOR] model.post_init by @PZS-ModelCloud in #103
✨ [BUILD] Add UV PIP usage instructions information by @CL-ModelCloud in #114
👾 [FIX] DeepSeek-V2-Lite load by @LRL-ModelCloud in #112
👾 [FIX] Opt fc1/fc2 layer modules should not be quantized by @Qubitium in #118

New Contributors

@CL-ModelCloud made their first contribution in #114

Full Changelog: v0.9.1...v0.9.2

Contributors

Qubitium, PZS-ModelCloud, and 3 other contributors

Assets 2

27 Jun 07:30

Qubitium

v0.9.1

71ed742

GPTQModel v0.9.1

What's Changed

v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize() has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.

✨ [FEATURE + New FORMAT] Add Bitblas Format/Kernel Support by @LeiWang1999 @ZX-ModelCloud @Qubitium in #39
✨ [FEATURE] Save sharded by @LaaZa @CSY-ModelCloud @PZS-ModelCloud in #40 #69
✨ [FEATURE/SECURITY] Add verify_hash to validate model weights via stored hashes by @PZS-ModelCloud in #50
🚀 [CORE/REFACTOR] Consolidate 6+ passive use_xxx and disable_xxx args to single explicit backend arg by @ZX-ModelCloud in #68
🚀 [MODEL] DeepSeek-V2 support by @LRL-ModelCloud in #51
🚀 [MODEL] DeepSeek-V2-Lite support by @LRL-ModelCloud in #74
🚀 [MODEL] DBRX Converted support by @Qubitium @LRL-ModelCloud in #38
👾 [FIX] Batching of calibration data in .quantize() by @LRL-ModelCloud in #70
👾 [FIX] Cannot pickle 'module' object for 8 bit (fix #47) by @CSY-ModelCloud in #49
👾 [FIX] Format load check by @Qubitium in #53
👾 [FIX] save_quantized() using wrong model to obtain state_dict() by @LRL-ModelCloud in #54
👾 [FIX] Rename exllama_kernels class name to fix import/ext conflicts with autogptq by @CSY-ModelCloud in #71
🤖 [CI] Speed up unit tests @CSY-ModelCloud in #37 and #41 and #46 and #55
🤖 [CI] Improve unit tests @ZYC-ModelCloud in #58 #72
🤖 👾 [CI] FIx Marlin format desc_act must be False. by @LRL-ModelCloud in #57

New Contributors

Full Changelog: v0.9.0...v0.9.1

Contributors

Qubitium, LaaZa, and 6 other contributors

Assets 2

20 Jun 17:50

Qubitium

v0.9.0

6bf62cf

GPTQModel v0.9.0

What's Changed (First Release since AutoGPTQ fork)

4 New Models plus sym=False asymmetry and lm_head quantized inference support.

✨ [FEATURE/BUG] sym=false support by @qwopqwop200, @Qubitium, @fxmarty
✨ [FEATURE] lm_head quantization inference by @Qubitium
🚀 [MODEL] ChatGLM by @LRL-ModelCloud @Qubitium
🚀 [MODEL] MiniCPM model support by @LDLINGLINGLING, @Qubitium in #18
🚀 [MODEL] Phi-3 model support by @davidgxue, @ZX-ModelCloud in #27
🚀 [MODEL] QwenMoE model support by @bozheng-hit, @LRL-ModelCloud in #24
🚀 [CORE] Faster quantization and better quality (PPL) quant by @Qubitium
👾[BUG] H100 crash by @Qubitium
👾[BUG] Packing perf regression on high core-count systems by @Qubitium
🚀 [REFRACTOR] Major refractor and code debloat by @Qubitium
🤖 [CI] Code quality by @CSY-ModelCloud in #31
🤖 [CI] Add Perplexity regression test by @LRL-ModelCloud in #1
🤖 [CI] Add Runner by @CSY-ModelCloud in #3

Full Changelog: https://github.com/ModelCloud/GPTQModel/commits/v0.9.0

Contributors

Qubitium, bozheng-hit, and 7 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed (First Release since AutoGPTQ fork)

Contributors

Releases: ModelCloud/GPTQModel

GPTQModel v0.9.4

What's Changed

Contributors

GPTQModel v0.9.3

What's Changed

New Contributors

Contributors

GPTQModel v0.9.2

What's Changed

New Contributors

Contributors

GPTQModel v0.9.1

What's Changed

New Contributors

Contributors

GPTQModel v0.9.0

What's Changed (First Release since AutoGPTQ fork)

Contributors