0.0.19

github-actions released this 19 Apr 06:44

· 619 commits to master since this release

ed118b4

More accurate Q4 cache using groupwise rotations
Better prompt ingestion speed when using flash-attn
Minor fixes related to issues quantizing Llama 3
New, more robust optimizer
Fix bug on long-sequence inference for GPTQ models

Full Changelog: v0.0.18...v0.0.19

Assets 32