Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 3.15.1

09 Jun 09:56
Choose a tag to compare

Fixes and improvements

  • Fix an error when using the new static_prompt argument in the methods generate_tokens and generate_batch
  • Improve the performance of models using ALiBi

CTranslate2 3.15.0

06 Jun 14:13
Choose a tag to compare

New features

  • Initial support of encoder-only Transformer model via a new class ctranslate2.Encoder
  • Update the Transformers converter to support the Falcon models
  • Add a generation argument static_prompt to optimize the execution for models using system prompts: the model state for this prompt is cached and reused in future calls
  • Support early stopping in greedy search when the callback function returns True
  • Make the layer norm epsilon value configurable in the model configuration file config.json
  • Add Tanh as a possible activation function

Fixes and improvements

  • Fix a performance issue when running models using ALiBi on the GPU
  • Fix application of the rotary embeddings when the multi-query attention is used
  • Fix conversion of Marian models using tied-embeddings-all: false
  • Remove use_fast argument when loading Hugging Face tokenizers to use the default tokenizer for the model

CTranslate2 3.14.0

26 May 16:20
Choose a tag to compare

New features

  • Update the Transformers converter with new architectures:
    • CodeGen
    • GPTBigCode
    • LLaMa
    • MPT
  • Update the OpenNMT-py converter to support some recent options:
    • layer_norm="rms"
    • max_relative_positions=-1 (rotary embeddings)
    • max_relative_positions=-2 (ALiBi)
    • pos_ffn_activation_fn="silu"
  • Update the OpenNMT-tf converter to support models using different configurations for the encoder and decoder (e.g. post-norm in the encoder and pre-norm in the decoder)
  • Implement the multi-query attention (used by GPTBigCode)

Fixes and improvements

  • Support paths containing Unicode characters on Windows
  • Fix the generate_tokens method to properly raise the underlying exception instead of hanging indefinitely
  • Fix compilation error when using -DBUILD_SHARED_LIBS=OFF
  • Fix runtime errors when linking against libctranslate2.a without using the "whole archive" flags

CTranslate2 3.13.0

26 Apr 09:37
Choose a tag to compare

New features

  • Support conversion of GPT-NeoX models with the Transformers converter
  • Extend the end_token argument to also accept a list of tokens
  • Add option return_end_token to include the end token in the results of the methods generate_batch and translate_batch (by default the end token is removed)
  • Expose the callback argument for the methods generate_batch and translate_batch to get early results from the decoding loop
  • Fallback to a custom threading implementation when OpenMP is not used (which is currently the case for the macOS ARM64 Python wheels)
  • Define the CMake package CTranslate2::ctranslate2 to facilitate the library integration in other CMake projects

Fixes and improvements

  • Fix the vocabulary loading when some tokens end with the carriage return
  • Implement a fused kernel to apply the rotary embeddings
  • Update the Ruy library to commit 363f2522

CTranslate2 3.12.0

17 Apr 18:22
Choose a tag to compare

New features

  • Add methods Generator.generate_tokens and Translator.generate_tokens returning a generator that yields tokens as soon as they are generated by the model (not compatible with beam search)
  • Improve performance of rotary embeddings on CPU with an alternative implementation that is enabled when setting rotary_interleave=False in the model specification (may require to permute QK weights)
  • Support a variable number of input frames in method Whisper.align to improve batch support
  • Expose flag low_cpu_mem_usage in the Transformers converter to reduce the memory usage when loading large models (requires the package accelerate)

Fixes and improvements

  • Fix crash in Whisper.align when num_frames // 2 <= median_filter_width
  • Raise an error if arguments end_token or suppress_sequences contain tokens that are not in the vocabulary
  • Optimize the quantization of FP16 weights during the model conversion
  • In the Transformers converter, also load the model weights in FP16 when the selected quantization is int8_float16
  • Update the Whisper timestamp decoding rules to prevent the generation of segments with zero duration

CTranslate2 3.11.0

06 Apr 16:19
Choose a tag to compare


  • The Python wheels for macOS ARM are now built with the Ruy backend to support INT8 computation. This will change the performance and results when loading an INT8 model and/or using the auto compute type. To keep the previous behavior, set compute_type="float32".

New features

  • Support conversion of the GPT-J architecture
  • Support conversion of models using rotary position embeddings
  • Apply the new OpenNMT-py option decoder_start_token
  • Add option revision in the Transformers converter to download a specific revision of the model from the Hugging Face Hub

CTranslate2 3.10.3

30 Mar 15:52
Choose a tag to compare

Fixes and improvements

  • Fix a synchronization issue when the model input is a CUDA storage

CTranslate2 3.10.2

27 Mar 15:13
Choose a tag to compare

Fixes and improvements

  • Select the correct device when copying a StorageView instance

CTranslate2 3.10.1

27 Mar 15:12
Choose a tag to compare

Fixes and improvements

  • Add missing device setter in Whisper.encode

CTranslate2 3.10.0

24 Mar 09:45
Choose a tag to compare

New features

  • Add Generator option include_prompt_in_result (True by default)
  • Add method Whisper.encode to only run the Whisper encoder
  • Add model properties Whisper.device and Whisper.device_index

Fixes and improvements

  • Update the methods Whisper.detect_language, Whisper.generate, and Whisper.align to accept the encoder output
  • Fix a crash when running Generator.forward on GPU and the generator object is destroyed before the forward output
  • Fix parsing of Marian YAML vocabulary files containing "complex key mappings" and escaped sequences such as "\x84"