Skip to content

CTranslate2 2.15.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 04 Apr 12:04
· 613 commits to master since this release

New features

  • Expose translator option max_queued_batches to configure the maximum number of queued batches (when the queue is full, future requests will block until a free slot is available)
  • Allow converters to customize the vocabulary special tokens <unk>, <s>, and </s>

Fixes and improvements

  • Fix compatibility of models converted on Windows with other platforms by saving the vocabulary files with the newline character "\n" instead of "\r\n"
  • Clarify conversion error when no TensorFlow checkpoints are found in the configured model directory
  • Enable fused QKV transposition by switching the heads and time dimensions before the QKV split
  • Cache the prepared source lengths mask in the Transformer decoder state and reuse it in the next decoding steps
  • Pad the output layer to enable Tensor Cores only once instead of updating the layer on each batch
  • Vectorize copy in Concat and Split ops on GPU
  • Factorize all OpenMP parallel for loops to call the parallel_for function
  • Compile CUDA kernels for deprecated Compute Capabilities that are not yet dropped by CUDA:
    • CUDA 11: 3.5 and 5.0
    • CUDA 10: 3.0