CTranslate2 3.17.0

guillaumekln released this 18 Jul 10:26

· 158 commits to master since this release

3e51874

New features

Add new computation types: bfloat16 and int8_bfloat16 (require a GPU with Compute Capability 8.0 or above)
Support multi-query attention for encoder-decoder models
Allow converters to register weights as PyTorch tensors instead of Numpy arrays

Fixes and improvements

Pass the flag trust_remote_code when loading the tokenizer in the Transformers converter
Improve performance of T5 models by reusing the same relative position bias in every layers
Whisper: disable the first timestamp decoding rule when a prefix is used
Install the CMake configuration in the correct library directory (e.g. some platforms use lib64 instead of lib)

Assets 2