CTranslate2 3.17.1
Fixes and improvements
- Fix an error when running models with the new
int8_bfloat16
computation type - Fix a vocabulary error when converting Llama 2 models with the Transformers converter
- Update the Transformers converter to correctly convert Llama models using GQA
- Stop the decoding when the generator returned by the method
generate_tokens
is closed