Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 2.19.1

23 Jun 09:40
Compare
Choose a tag to compare

Fixes and improvements

  • Fix missing final bias in some MarianMT models converted from Transformers
  • Fix missing final layer normalization in OPT models converted from Transformers
  • Fix error when converting OpenNMT-tf V1 checkpoints with the new OpenNMT-tf converter
  • Reduce model conversion memory usage when the loaded weights are in FP16 and the model is converted with quantization
  • Add missing C++ type ctranslate2::float16_t in the public headers that is required to use some functions
  • Fix some Python typing annotations

CTranslate2 2.19.0

08 Jun 14:40
Compare
Choose a tag to compare

New features

  • Support conversion of decoder-only Transformer models trained with OpenNMT-tf

Fixes and improvements

  • Fix conversion error for Transformers' model facebook/bart-large-cnn
  • Fix crash when scoring empty sequences
  • Apply max_input_length after all special tokens have been added to the input
  • Clear the GPU memory cache when no new batches are immediately available for execution
  • Improve functions signature in the generated Python API documentation
  • Update oneDNN to 2.6
  • Update spdlog to 1.10.0
  • Update OpenBLAS to 0.3.20

CTranslate2 2.18.0

23 May 14:20
Compare
Choose a tag to compare

New features

  • Support Meta's OPT models via the Transformers converter
  • Extend the Fairseq converter to support transformer_lm models

Fixes and improvements

  • Fix conversion error for Marian's pre-norm Transformer models
  • Fix conversion error for Transformers' MarianMT models that are missing some configuration fields
  • Improve conversion speed of Marian models (optimize the generation of the sinusoidal position encodings)

CTranslate2 2.17.0

09 May 15:11
Compare
Choose a tag to compare

New features

  • Add a converter for Hugging Face's Transformers. The following models are currently supported:
    • BART
    • M2M100
    • MarianMT
    • MBART
    • OpenAI GPT2
  • Revisit the OpenNMT-tf converter to better support custom models and configurations:
    • Extend the conversion script to accept the training configuration
    • Add a new converter class ctranslate2.converters.OpenNMTTFConverterV2
  • Move all documentation and guides to the website to improve navigation and clarity

Fixes and improvements

  • In text generation, include the start token in the output if it is not the BOS token

CTranslate2 2.16.0

28 Apr 11:52
Compare
Choose a tag to compare

New features

  • Initial support of language models:
    • Add a high-level class ctranslate2.Generator to generate text with language models
    • Add a converter for OpenAI GPT-2 models
    • Update the OpenNMT-py converter to support transformer_lm decoders
  • Build ARM64 wheels for macOS
  • Allow loading custom Fairseq extensions and architectures during conversion with the option --user_dir
  • Enable conversion of the Fairseq architectures multilingual_transformer and multilingual_transformer_iwslt_de_en
  • Implement random sampling in beam search using the Gumbel-max trick
  • Generate and publish the Python API reference to https://opennmt.net/CTranslate2

Fixes and improvements

  • Fix model loading on a GPU with index > 0
  • Fix memory error when running random sampling on GPU with certain batch sizes
  • Fix incorrect tokens order in some converted Marian vocabularies
  • Properly count the number of layers before building the encoder/decoder instead of relying on runtime exceptions

CTranslate2 2.15.1

04 Apr 16:54
Compare
Choose a tag to compare

Fixes and improvements

  • Fix missing deactivation of OpenMP threading in GPU execution (regression introduced in version 2.15.0)

CTranslate2 2.15.0

04 Apr 12:04
Compare
Choose a tag to compare

New features

  • Expose translator option max_queued_batches to configure the maximum number of queued batches (when the queue is full, future requests will block until a free slot is available)
  • Allow converters to customize the vocabulary special tokens <unk>, <s>, and </s>

Fixes and improvements

  • Fix compatibility of models converted on Windows with other platforms by saving the vocabulary files with the newline character "\n" instead of "\r\n"
  • Clarify conversion error when no TensorFlow checkpoints are found in the configured model directory
  • Enable fused QKV transposition by switching the heads and time dimensions before the QKV split
  • Cache the prepared source lengths mask in the Transformer decoder state and reuse it in the next decoding steps
  • Pad the output layer to enable Tensor Cores only once instead of updating the layer on each batch
  • Vectorize copy in Concat and Split ops on GPU
  • Factorize all OpenMP parallel for loops to call the parallel_for function
  • Compile CUDA kernels for deprecated Compute Capabilities that are not yet dropped by CUDA:
    • CUDA 11: 3.5 and 5.0
    • CUDA 10: 3.0

CTranslate2 2.14.0

16 Mar 10:29
Compare
Choose a tag to compare

New features

  • Include BART and MBART in the list of supported Fairseq architectures
  • Add Fairseq converter option --no_default_special_tokens to require all special tokens to be set by the user during inference, including the decoder start tokens (for example, this is required by MBART-25 to properly set the language tokens)

Fixes and improvements

  • Fix conversion of Post-Norm Transformers trained with OpenNMT-tf
  • Fix scoring with Fairseq models that used an incorrect decoder start token (Fairseq uses </s> as the decoder start token, not <s>)
  • Fix scoring result to include the end of sentence token
  • Ignore OpenNMT-py options --alignment_layer and --alignment_heads for models that are not trained with alignments
  • Enable batch encoding in return_alternatives translation mode (the decoding still runs sequentially)
  • Make enumerations ctranslate2.specs.Activation and ctranslate2.specs.EmbeddingsMerge public since they could be used to configure the Transformer specification
  • Update oneDNN to 2.5.3
  • Update cpu_features to 0.7.0
  • Update cxxopts to 3.0.0
  • Update spdlog to 1.9.2

CTranslate2 2.13.1

02 Mar 16:09
Compare
Choose a tag to compare

Fixes and improvements

  • Fix conversion error for old OpenNMT-py models that do not have the option self_attn_type

CTranslate2 2.13.0

28 Feb 11:39
Compare
Choose a tag to compare

New features

  • Add converter for Marian and support the collection of OPUS-MT pretrained models
  • Support models applying a layer normalization after the embedding layer (cf. option --layernorm-embedding in Fairseq)
  • Support models using the Swish (a.k.a SiLU) activation function
  • Support models using custom decoder start tokens, which can be passed in the target prefix

Fixes and improvements

  • Remove unexpected call to a CUDA function in CPU execution when unloading models
  • Add option groups in the translation client help output
  • Use new thrust::cuda::par_nosync execution policy when calling Thrust functions
  • Update Thrust to 1.16.0
  • Update pybind11 to 2.9.1