0.0.19
github-actions
released this
19 Apr 06:44
·
619 commits
to master
since this release
- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models
Full Changelog: v0.0.18...v0.0.19