Skip to content

CTranslate2 4.4.0

Compare
Choose a tag to compare
@minhthuc2502 minhthuc2502 released this 09 Sep 09:21
· 12 commits to master since this release
8f4d134

Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN option.
Flash Attention may be re-added in the future if substantial improvements are made.

New features

Fixes and improvements

  • Fix pipeline (#1723 + #1747)
  • Some improvements in flash attention (#1732)
  • Fix crash when using return_alternative on CUDA (#1733)
  • Quantization AWQ GEMM + GEMV (#1727)