CTranslate2 4.4.0
Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN
option.
Flash Attention may be re-added in the future if substantial improvements are made.
New features
- Support Llama3 (#1751)
- Support Gemma2 (#1772)
- Add log probs for all tokens in vocab (#1755)
- Grouped conv1d (#1749 + #1758)