Releases: ModelCloud/GPTQModel
GPTQModel v0.9.4
What's Changed
- 🚀 [FEATURE] Added Transformers Integration via monkeypatch by @ZX-ModelCloud in #147
- 👾 [FIX] Typo causing Gemma 2 errors by @LRL-ModelCloud in #158
Full Changelog: v0.9.3...v0.9.4
GPTQModel v0.9.3
What's Changed
- 🚀 [MODEL] Add Gemma 2 support by @LRL-ModelCloud in #131
- 🚀 [OTHER] Calculate ppl on gpu by @ZYC-ModelCloud in #135
- ✨ [REFRACTOR] BaseQuantLinear and avoid using shared QuantLinear cls name by @PZS-ModelCloud in #116
- ✨ [KERNEL] Bitblas cache stablity by @Qubitium in #129
- 👾 [FIX] Export TORCH_CUDA_ARCH_LIST in install.sh by @LeiWang1999 in #133
- 👾 [FIX] Limit Bitblas numexpr thread usage by @Qubitium in #125
- 👾 [FIX] Revert "Skip opt fc1/fc2 for quantization" due to inference regressions (#118)" by @Qubitium in #149
- ✨ [REFRACTOR] remove max_memory arg by @CL-ModelCloud in #144
- 🤖 [CI] Fix test was skipped by @CSY-ModelCloud in #145
- 🤖 [CI] Add GPU selector for runner by @CSY-ModelCloud in #148
New Contributors
- @LeiWang1999 made their first contribution in #133
Full Changelog: v0.9.2...v0.9.3
GPTQModel v0.9.2
What's Changed
Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.
- ✨ [FEATURE/FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin by @Qubitium @LRL-ModelCloud in #98
- ✨ [REFRACTOR] remove use_cuda_fp16 argument by @ZX-ModelCloud in #97
- ✨ [REFRACTOR]
model.post_init
by @PZS-ModelCloud in #103 - ✨ [BUILD] Add UV PIP usage instructions information by @CL-ModelCloud in #114
- 👾 [FIX] DeepSeek-V2-Lite load by @LRL-ModelCloud in #112
- 👾 [FIX] Opt fc1/fc2 layer modules should not be quantized by @Qubitium in #118
New Contributors
- @CL-ModelCloud made their first contribution in #114
Full Changelog: v0.9.1...v0.9.2
GPTQModel v0.9.1
What's Changed
v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize()
has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.
- ✨ [FEATURE + New FORMAT] Add Bitblas Format/Kernel Support by @LeiWang1999 @ZX-ModelCloud @Qubitium in #39
- ✨ [FEATURE] Save sharded by @LaaZa @CSY-ModelCloud @PZS-ModelCloud in #40 #69
- ✨ [FEATURE/SECURITY] Add
verify_hash
to validate model weights via stored hashes by @PZS-ModelCloud in #50 - 🚀 [CORE/REFACTOR] Consolidate 6+ passive
use_xxx
anddisable_xxx
args to single explicitbackend
arg by @ZX-ModelCloud in #68 - 🚀 [MODEL] DeepSeek-V2 support by @LRL-ModelCloud in #51
- 🚀 [MODEL] DeepSeek-V2-Lite support by @LRL-ModelCloud in #74
- 🚀 [MODEL] DBRX Converted support by @Qubitium @LRL-ModelCloud in #38
- 👾 [FIX] Batching of calibration data in .quantize() by @LRL-ModelCloud in #70
- 👾 [FIX] Cannot pickle 'module' object for 8 bit (fix #47) by @CSY-ModelCloud in #49
- 👾 [FIX] Format load check by @Qubitium in #53
- 👾 [FIX]
save_quantized()
using wrong model to obtain state_dict() by @LRL-ModelCloud in #54 - 👾 [FIX] Rename exllama_kernels class name to fix import/ext conflicts with autogptq by @CSY-ModelCloud in #71
- 🤖 [CI] Speed up unit tests @CSY-ModelCloud in #37 and #41 and #46 and #55
- 🤖 [CI] Improve unit tests @ZYC-ModelCloud in #58 #72
- 🤖 👾 [CI] FIx Marlin format
desc_act
must be False. by @LRL-ModelCloud in #57
New Contributors
- @LeiWang1999 in #39
- @LaaZa in #40
- @PZS-ModelCloud in #50
- @ZYC-ModelCloud in #58
Full Changelog: v0.9.0...v0.9.1
GPTQModel v0.9.0
What's Changed (First Release since AutoGPTQ fork)
4 New Models plus sym=False
asymmetry and lm_head
quantized inference support.
- ✨ [FEATURE/BUG]
sym=false
support by @qwopqwop200, @Qubitium, @fxmarty - ✨ [FEATURE]
lm_head
quantization inference by @Qubitium - 🚀 [MODEL] ChatGLM by @LRL-ModelCloud @Qubitium
- 🚀 [MODEL] MiniCPM model support by @LDLINGLINGLING, @Qubitium in #18
- 🚀 [MODEL] Phi-3 model support by @davidgxue, @ZX-ModelCloud in #27
- 🚀 [MODEL] QwenMoE model support by @bozheng-hit, @LRL-ModelCloud in #24
- 🚀 [CORE] Faster quantization and better quality (PPL) quant by @Qubitium
- 👾[BUG] H100 crash by @Qubitium
- 👾[BUG] Packing perf regression on high core-count systems by @Qubitium
- 🚀 [REFRACTOR] Major refractor and code debloat by @Qubitium
- 🤖 [CI] Code quality by @CSY-ModelCloud in #31
- 🤖 [CI] Add Perplexity regression test by @LRL-ModelCloud in #1
- 🤖 [CI] Add Runner by @CSY-ModelCloud in #3
Full Changelog: https://github.com/ModelCloud/GPTQModel/commits/v0.9.0