Environment variables

Some environment variables can be configured to customize the execution. When using Python, these variables should be set before importing the ctranslate2 module, e.g.:

import os
os.environ["CT2_VERBOSE"] = "1"

import ctranslate2

Boolean environment variables can be enabled with `"1"` or `"true"`.

`CT2_CUDA_ALLOCATOR`

Allocating memory on the GPU with cudaMalloc is costly and is best avoided in high-performance code. For this reason CTranslate2 integrates caching allocators which enable a fast reuse of previously allocated buffers. The following allocators are integrated:

cuda_malloc_async (default for CUDA >= 11.2)
Uses the asynchronous allocator with memory pools introduced in CUDA 11.2.
cub_caching (default for CUDA < 11.2)
Uses the caching allocator from the CUB project.

`CT2_CUDA_ALLOW_BF16`

Allow using BF16 computation on GPU even if the device does not have efficient BF16 support.

`CT2_CUDA_ALLOW_FP16`

Allow using FP16 computation on GPU even if the device does not have efficient FP16 support.

`CT2_CUDA_TRUE_FP16_GEMM`

Allow using true FP16 computation in GEMM operations. When disabled, the computation or accumulation may use FP32 instead.

This flag is enabled by default, but some models may automatically disable it when they are known to work better with the increased precision.

`CT2_CUDA_CACHING_ALLOCATOR_CONFIG`

The cub_caching allocator can be configured to tradeoff memory usage and speed. By default, CTranslate2 uses the following values which have been selected experimentally:

bin_growth = 4
min_bin = 3
max_bin = 12
max_cached_bytes = 209715200 (200MB)

You can override these parameters with comma-separated values in the same order as the list above:

export CT2_CUDA_CACHING_ALLOCATOR_CONFIG=8,3,7,6291455

See the description of each parameter in the allocator implementation.

`CT2_FORCE_CPU_ISA`

Force CTranslate2 to select a specific instruction set architecture (ISA). Possible values are:

GENERIC
AVX
AVX2
AVX512

This does not impact backend libraries (such as Intel MKL) which usually have their own environment variables to configure ISA dispatching.

`CT2_USE_EXPERIMENTAL_PACKED_GEMM`

Enable the packed GEMM API for Intel MKL which can improve performance for single-core decoding. See Intel's article to learn more about packed GEMM.

`CT2_USE_MKL`

Force CTranslate2 to use (or not) Intel MKL. By default, the runtime automatically decides whether to use Intel MKL or not based on the CPU vendor.

`CT2_VERBOSE`

Configure the default logs verbosity:

-3 = off
-2 = critical
-1 = error
0 = warning (default)
1 = info
2 = debug
3 = trace

The log level can also be controlled by API. See for example the Python function [`ctranslate2.set_log_level`](python/ctranslate2.set_log_level.rst).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

environment_variables.md

environment_variables.md

Environment variables

`CT2_CUDA_ALLOCATOR`

`CT2_CUDA_ALLOW_BF16`

`CT2_CUDA_ALLOW_FP16`

`CT2_CUDA_TRUE_FP16_GEMM`

`CT2_CUDA_CACHING_ALLOCATOR_CONFIG`

`CT2_FORCE_CPU_ISA`

`CT2_USE_EXPERIMENTAL_PACKED_GEMM`

`CT2_USE_MKL`

`CT2_VERBOSE`

Files

environment_variables.md

Latest commit

History

environment_variables.md

File metadata and controls

Environment variables

CT2_CUDA_ALLOCATOR

CT2_CUDA_ALLOW_BF16

CT2_CUDA_ALLOW_FP16

CT2_CUDA_TRUE_FP16_GEMM

CT2_CUDA_CACHING_ALLOCATOR_CONFIG

CT2_FORCE_CPU_ISA

CT2_USE_EXPERIMENTAL_PACKED_GEMM

CT2_USE_MKL

CT2_VERBOSE

`CT2_CUDA_ALLOCATOR`

`CT2_CUDA_ALLOW_BF16`

`CT2_CUDA_ALLOW_FP16`

`CT2_CUDA_TRUE_FP16_GEMM`

`CT2_CUDA_CACHING_ALLOCATOR_CONFIG`

`CT2_FORCE_CPU_ISA`

`CT2_USE_EXPERIMENTAL_PACKED_GEMM`

`CT2_USE_MKL`

`CT2_VERBOSE`