Releases: OpenNMT/CTranslate2
Releases · OpenNMT/CTranslate2
CTranslate2 2.19.1
Fixes and improvements
- Fix missing final bias in some MarianMT models converted from Transformers
- Fix missing final layer normalization in OPT models converted from Transformers
- Fix error when converting OpenNMT-tf V1 checkpoints with the new OpenNMT-tf converter
- Reduce model conversion memory usage when the loaded weights are in FP16 and the model is converted with quantization
- Add missing C++ type
ctranslate2::float16_t
in the public headers that is required to use some functions - Fix some Python typing annotations
CTranslate2 2.19.0
New features
- Support conversion of decoder-only Transformer models trained with OpenNMT-tf
Fixes and improvements
- Fix conversion error for Transformers' model
facebook/bart-large-cnn
- Fix crash when scoring empty sequences
- Apply
max_input_length
after all special tokens have been added to the input - Clear the GPU memory cache when no new batches are immediately available for execution
- Improve functions signature in the generated Python API documentation
- Update oneDNN to 2.6
- Update spdlog to 1.10.0
- Update OpenBLAS to 0.3.20
CTranslate2 2.18.0
New features
- Support Meta's OPT models via the Transformers converter
- Extend the Fairseq converter to support
transformer_lm
models
Fixes and improvements
- Fix conversion error for Marian's pre-norm Transformer models
- Fix conversion error for Transformers' MarianMT models that are missing some configuration fields
- Improve conversion speed of Marian models (optimize the generation of the sinusoidal position encodings)
CTranslate2 2.17.0
New features
- Add a converter for Hugging Face's Transformers. The following models are currently supported:
- BART
- M2M100
- MarianMT
- MBART
- OpenAI GPT2
- Revisit the OpenNMT-tf converter to better support custom models and configurations:
- Extend the conversion script to accept the training configuration
- Add a new converter class
ctranslate2.converters.OpenNMTTFConverterV2
- Move all documentation and guides to the website to improve navigation and clarity
Fixes and improvements
- In text generation, include the start token in the output if it is not the BOS token
CTranslate2 2.16.0
New features
- Initial support of language models:
- Add a high-level class
ctranslate2.Generator
to generate text with language models - Add a converter for OpenAI GPT-2 models
- Update the OpenNMT-py converter to support
transformer_lm
decoders
- Add a high-level class
- Build ARM64 wheels for macOS
- Allow loading custom Fairseq extensions and architectures during conversion with the option
--user_dir
- Enable conversion of the Fairseq architectures
multilingual_transformer
andmultilingual_transformer_iwslt_de_en
- Implement random sampling in beam search using the Gumbel-max trick
- Generate and publish the Python API reference to https://opennmt.net/CTranslate2
Fixes and improvements
- Fix model loading on a GPU with index > 0
- Fix memory error when running random sampling on GPU with certain batch sizes
- Fix incorrect tokens order in some converted Marian vocabularies
- Properly count the number of layers before building the encoder/decoder instead of relying on runtime exceptions
CTranslate2 2.15.1
Fixes and improvements
- Fix missing deactivation of OpenMP threading in GPU execution (regression introduced in version 2.15.0)
CTranslate2 2.15.0
New features
- Expose translator option
max_queued_batches
to configure the maximum number of queued batches (when the queue is full, future requests will block until a free slot is available) - Allow converters to customize the vocabulary special tokens
<unk>
,<s>
, and</s>
Fixes and improvements
- Fix compatibility of models converted on Windows with other platforms by saving the vocabulary files with the newline character "\n" instead of "\r\n"
- Clarify conversion error when no TensorFlow checkpoints are found in the configured model directory
- Enable fused QKV transposition by switching the heads and time dimensions before the QKV split
- Cache the prepared source lengths mask in the Transformer decoder state and reuse it in the next decoding steps
- Pad the output layer to enable Tensor Cores only once instead of updating the layer on each batch
- Vectorize copy in Concat and Split ops on GPU
- Factorize all OpenMP parallel for loops to call the
parallel_for
function - Compile CUDA kernels for deprecated Compute Capabilities that are not yet dropped by CUDA:
- CUDA 11: 3.5 and 5.0
- CUDA 10: 3.0
CTranslate2 2.14.0
New features
- Include BART and MBART in the list of supported Fairseq architectures
- Add Fairseq converter option
--no_default_special_tokens
to require all special tokens to be set by the user during inference, including the decoder start tokens (for example, this is required by MBART-25 to properly set the language tokens)
Fixes and improvements
- Fix conversion of Post-Norm Transformers trained with OpenNMT-tf
- Fix scoring with Fairseq models that used an incorrect decoder start token (Fairseq uses
</s>
as the decoder start token, not<s>
) - Fix scoring result to include the end of sentence token
- Ignore OpenNMT-py options
--alignment_layer
and--alignment_heads
for models that are not trained with alignments - Enable batch encoding in
return_alternatives
translation mode (the decoding still runs sequentially) - Make enumerations
ctranslate2.specs.Activation
andctranslate2.specs.EmbeddingsMerge
public since they could be used to configure the Transformer specification - Update oneDNN to 2.5.3
- Update cpu_features to 0.7.0
- Update cxxopts to 3.0.0
- Update spdlog to 1.9.2
CTranslate2 2.13.1
Fixes and improvements
- Fix conversion error for old OpenNMT-py models that do not have the option
self_attn_type
CTranslate2 2.13.0
New features
- Add converter for Marian and support the collection of OPUS-MT pretrained models
- Support models applying a layer normalization after the embedding layer (cf. option
--layernorm-embedding
in Fairseq) - Support models using the Swish (a.k.a SiLU) activation function
- Support models using custom decoder start tokens, which can be passed in the target prefix
Fixes and improvements
- Remove unexpected call to a CUDA function in CPU execution when unloading models
- Add option groups in the translation client help output
- Use new
thrust::cuda::par_nosync
execution policy when calling Thrust functions - Update Thrust to 1.16.0
- Update pybind11 to 2.9.1