Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 1.15.0

06 Nov 15:15
Compare
Choose a tag to compare

New features

  • [Experimental] The Python package published on PyPI now includes GPU support. The binary is compiled with CUDA 10.1, but all CUDA dependencies are integrated in the package and do not need to be installed on the system. The only requirement should be a working GPU with driver version >= 418.39.

Fixes and improvements

  • Remove the TensorRT dependency to simplify installation and reduce memory usage:
    • Reduce GPU Docker images size by 600MB
    • Reduce memory usage on the GPU and the system by up 1GB
    • Reduce initialization time during the first GPU translation
  • Improve TopK performance on GPU for K < 5
  • Improve INT8 performance on GPU
  • Accept linear layers without bias when converting models
  • Update Intel MKL to 2020.4
  • [Python] Improve compatibility with Python 3.9

CTranslate2 1.14.0

14 Oct 08:26
Compare
Choose a tag to compare

New features

  • Accept target prefix in file translation APIs

Fixes and improvements

  • Fix CUDA illegal memory access when changing the beam size in the same process
  • Fix decoding with target prefix that sometimes did not go beyond the prefix
  • Fix Intel MKL search paths on macOS
  • Update Intel MKL to 2020.3
  • Clarify error message when selecting a CUDA device in CPU-only builds

CTranslate2 1.13.2

31 Aug 16:20
Compare
Choose a tag to compare

Fixes and improvements

  • Fix model conversion to float16 when using the Python converters: weights were duplicated and not correctly converted
  • Fix incorrect code logic that could lead to incorrect translation results in batch mode

CTranslate2 1.13.1

06 Aug 15:03
Compare
Choose a tag to compare

Fixes and improvements

  • Fix performance regression when decoding with a large beam size on GPU

CTranslate2 1.13.0

30 Jul 11:36
Compare
Choose a tag to compare

New features

  • Environment variable CT2_TRANSLATORS_CORE_OFFSET to pin parallel translators to a range of CPU cores (only for intra_threads = 1)
  • [Python] Add some properties to the Translator object:
    • device
    • device_index
    • num_translators
    • num_queued_batches
    • model_is_loaded

Fixes and improvements

  • Improve batch performance of target prefix
  • Improve performance when the input batch contains sentences with very different lengths
  • Improve beam search performance by expanding the batch size only after the first decoding step
  • Optimize Transpose op on GPU for the permutation used in multi-head attention
  • Remove padding in returned attention vectors
  • Update Intel MKL to 2020.2

CTranslate2 1.12.1

20 Jul 14:08
Compare
Choose a tag to compare

Fixes and improvements

  • Fix implicit int16 to float16 model conversion on compatible GPUs

CTranslate2 1.12.0

16 Jul 13:27
Compare
Choose a tag to compare

Changes

  • Docker images based on Ubuntu 16.04 are no longer updated

New features

  • Support float16 data type for model conversion (with --quantization float16) and computation (with --compute_type float16). FP16 execution can improve performance by up to 50% on NVIDIA GPUs with Compute Capability >= 7.0.
  • Add Docker images with newer CUDA versions, which can improve performance in some cases:
    • latest-ubuntu18-cuda10.0 (same as latest-ubuntu18-gpu)
    • latest-ubuntu18-cuda10.1
    • latest-ubuntu18-cuda10.2
    • latest-centos7-cuda10.0 (same as latest-centos7-gpu)
    • latest-centos7-cuda10.1
    • latest-centos7-cuda10.2
  • Allow setting a computation type per device (e.g. Translator(..., compute_type={"cuda": "float16", "cpu": "int8"}) with the Python API)
  • [C++] Add ModelReader interface to customize model loading

Fixes and improvements

  • Optimize Transpose op on CPU for the permutation used in multi-head attention
  • Optimize GELU op on CPU with Intel MKL
  • Fix compilation when targeting an architecture and disabling ISA dispatch (e.g.: -DCMAKE_CXX_FLAGS="-march=skylake" -DENABLE_CPU_DISPATCH=OFF)
  • Inline some frequently called methods

CTranslate2 1.11.0

29 Jun 16:02
Compare
Choose a tag to compare

New features

  • Add tokenization and detokenization hooks for file translation APIs
  • Add alternatives to Intel MKL:
    • Integrate oneDNN for GEMM functions
    • Implement vectorized operators that automatically select the instruction set architecture (ISA) (can be manually controlled with the CT2_FORCE_CPU_ISA environment variable)
  • When alternatives are available, avoid using Intel MKL on non Intel processors (can be manually controlled with the CT2_USE_MKL environment variable)
  • Enable a verbose mode with the environment variable CT2_VERBOSE=1 to help debugging the run configuration (e.g. the detected CPU, whether Intel MKL is being used, etc.)

Fixes and improvements

  • Improve numerical precision of SoftMax and LogSoftMax layers on CPU
  • Parallelize INT16 quantization/dequantization and ReLU on CPU
  • Add back the translation client in CentOS 7 Docker images

CTranslate2 1.10.2

23 Jun 10:35
Compare
Choose a tag to compare

Fixes and improvements

  • [Python] Fix error when calling unload_model(to_cpu=True) for models with shared weights
  • [Python] Do not ignore errors when importing the compiled translator extension

CTranslate2 1.10.1

25 May 15:50
Compare
Choose a tag to compare

Fixes and improvements

  • Force intra_threads to 1 when running a model on GPU to prevent high CPU load
  • Improve handling of decoding length constraints when using a target prefix
  • Do not raise an error when setting use_vmap but no vocabulary map exists