Skip to content

0.0.19

Compare
Choose a tag to compare
@github-actions github-actions released this 19 Apr 06:44
· 619 commits to master since this release
ed118b4
  • More accurate Q4 cache using groupwise rotations
  • Better prompt ingestion speed when using flash-attn
  • Minor fixes related to issues quantizing Llama 3
  • New, more robust optimizer
  • Fix bug on long-sequence inference for GPTQ models

Full Changelog: v0.0.18...v0.0.19