Skip to content

CTranslate2 3.17.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 18 Jul 10:26
· 158 commits to master since this release

New features

  • Add new computation types: bfloat16 and int8_bfloat16 (require a GPU with Compute Capability 8.0 or above)
  • Support multi-query attention for encoder-decoder models
  • Allow converters to register weights as PyTorch tensors instead of Numpy arrays

Fixes and improvements

  • Pass the flag trust_remote_code when loading the tokenizer in the Transformers converter
  • Improve performance of T5 models by reusing the same relative position bias in every layers
  • Whisper: disable the first timestamp decoding rule when a prefix is used
  • Install the CMake configuration in the correct library directory (e.g. some platforms use lib64 instead of lib)