Highlights:
We have enhanced support for extremely large models with the following updates:
Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning
Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma
Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format
What's Changed
- update format readme by @wenhuach21 in #411
- fix log bug and device "auto" bug by @n1ck-guo in #409
- speedup packing stage for autogptq and autoawq format by @wenhuach21 in #407
- support naive multi-card tuning by @wenhuach21 in #415
- support bf16 inference for autoround format by @wenhuach21 in #420
- enable backup pile dataset loading by @WeiweiZhang1 in #417
- fix evaluation device bug, relate to issue 413 by @n1ck-guo in #419
- support to export deepseek v3 gguf format by @n1ck-guo in #416
- fix cuda UT torch_dtype by @WeiweiZhang1 in #423
- fix eval trust_remote_code by @n1ck-guo in #424
Full Changelog: v0.4.4...v0.4.5