Skip to content

v0.3.0

Compare
Choose a tag to compare
@dhuangnm dhuangnm released this 13 Nov 05:22
· 22 commits to main since this release
93832a6

What's New in v0.3.0

Key Features and Improvements

  • GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
  • Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
  • Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
  • Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
  • Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

  • Fix Tied Tensors Bug (#659)
  • Observer Initialization in GPTQ Wrapper (#883)
  • Sparsity Reload Testing (#882)

Documentation

  • Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

New Contributors

Full Changelog: 0.2.0...0.3.0