Release CTranslate2 1.12.0 · OpenNMT/CTranslate2

Changes

Support float16 data type for model conversion (with --quantization float16) and computation (with --compute_type float16). FP16 execution can improve performance by up to 50% on NVIDIA GPUs with Compute Capability >= 7.0.
Add Docker images with newer CUDA versions, which can improve performance in some cases:
- latest-ubuntu18-cuda10.0 (same as latest-ubuntu18-gpu)
- latest-ubuntu18-cuda10.1
- latest-ubuntu18-cuda10.2
- latest-centos7-cuda10.0 (same as latest-centos7-gpu)
- latest-centos7-cuda10.1
- latest-centos7-cuda10.2
Allow setting a computation type per device (e.g. Translator(..., compute_type={"cuda": "float16", "cpu": "int8"}) with the Python API)
[C++] Add ModelReader interface to customize model loading

Optimize Transpose op on CPU for the permutation used in multi-head attention
Optimize GELU op on CPU with Intel MKL
Fix compilation when targeting an architecture and disabling ISA dispatch (e.g.: -DCMAKE_CXX_FLAGS="-march=skylake" -DENABLE_CPU_DISPATCH=OFF)
Inline some frequently called methods