Skip to content

CTranslate2 1.12.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 16 Jul 13:27
· 1207 commits to master since this release

Changes

  • Docker images based on Ubuntu 16.04 are no longer updated

New features

  • Support float16 data type for model conversion (with --quantization float16) and computation (with --compute_type float16). FP16 execution can improve performance by up to 50% on NVIDIA GPUs with Compute Capability >= 7.0.
  • Add Docker images with newer CUDA versions, which can improve performance in some cases:
    • latest-ubuntu18-cuda10.0 (same as latest-ubuntu18-gpu)
    • latest-ubuntu18-cuda10.1
    • latest-ubuntu18-cuda10.2
    • latest-centos7-cuda10.0 (same as latest-centos7-gpu)
    • latest-centos7-cuda10.1
    • latest-centos7-cuda10.2
  • Allow setting a computation type per device (e.g. Translator(..., compute_type={"cuda": "float16", "cpu": "int8"}) with the Python API)
  • [C++] Add ModelReader interface to customize model loading

Fixes and improvements

  • Optimize Transpose op on CPU for the permutation used in multi-head attention
  • Optimize GELU op on CPU with Intel MKL
  • Fix compilation when targeting an architecture and disabling ISA dispatch (e.g.: -DCMAKE_CXX_FLAGS="-march=skylake" -DENABLE_CPU_DISPATCH=OFF)
  • Inline some frequently called methods