Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

OpenNMT / CTranslate2 Public

Notifications You must be signed in to change notification settings
Fork 305
Star 3.4k

Code
Issues 166
Pull requests 27
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: OpenNMT/CTranslate2

Releases · OpenNMT/CTranslate2

CTranslate2 1.15.0

06 Nov 15:15

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.15.0

New features

[Experimental] The Python package published on PyPI now includes GPU support. The binary is compiled with CUDA 10.1, but all CUDA dependencies are integrated in the package and do not need to be installed on the system. The only requirement should be a working GPU with driver version >= 418.39.

Fixes and improvements

Remove the TensorRT dependency to simplify installation and reduce memory usage:
- Reduce GPU Docker images size by 600MB
- Reduce memory usage on the GPU and the system by up 1GB
- Reduce initialization time during the first GPU translation
Improve TopK performance on GPU for K < 5
Improve INT8 performance on GPU
Accept linear layers without bias when converting models
Update Intel MKL to 2020.4
[Python] Improve compatibility with Python 3.9

Assets 2

Loading

All reactions

CTranslate2 1.14.0

14 Oct 08:26

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.14.0

New features

Accept target prefix in file translation APIs

Fixes and improvements

Fix CUDA illegal memory access when changing the beam size in the same process
Fix decoding with target prefix that sometimes did not go beyond the prefix
Fix Intel MKL search paths on macOS
Update Intel MKL to 2020.3
Clarify error message when selecting a CUDA device in CPU-only builds

Assets 2

Loading

All reactions

CTranslate2 1.13.2

31 Aug 16:20

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.13.2

Fixes and improvements

Fix model conversion to float16 when using the Python converters: weights were duplicated and not correctly converted
Fix incorrect code logic that could lead to incorrect translation results in batch mode

Assets 2

Loading

All reactions

CTranslate2 1.13.1

06 Aug 15:03

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.13.1

Fixes and improvements

Fix performance regression when decoding with a large beam size on GPU

Assets 2

Loading

All reactions

CTranslate2 1.13.0

30 Jul 11:36

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.13.0

New features

Environment variable CT2_TRANSLATORS_CORE_OFFSET to pin parallel translators to a range of CPU cores (only for intra_threads = 1)
[Python] Add some properties to the Translator object:
- device
- device_index
- num_translators
- num_queued_batches
- model_is_loaded

Fixes and improvements

Improve batch performance of target prefix
Improve performance when the input batch contains sentences with very different lengths
Improve beam search performance by expanding the batch size only after the first decoding step
Optimize Transpose op on GPU for the permutation used in multi-head attention
Remove padding in returned attention vectors
Update Intel MKL to 2020.2

Assets 2

Loading

All reactions

CTranslate2 1.12.1

20 Jul 14:08

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.12.1

Fixes and improvements

Fix implicit int16 to float16 model conversion on compatible GPUs

Assets 2

Loading

All reactions

CTranslate2 1.12.0

16 Jul 13:27

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.12.0

Changes

Docker images based on Ubuntu 16.04 are no longer updated

New features

Support float16 data type for model conversion (with --quantization float16) and computation (with --compute_type float16). FP16 execution can improve performance by up to 50% on NVIDIA GPUs with Compute Capability >= 7.0.
Add Docker images with newer CUDA versions, which can improve performance in some cases:
- latest-ubuntu18-cuda10.0 (same as latest-ubuntu18-gpu)
- latest-ubuntu18-cuda10.1
- latest-ubuntu18-cuda10.2
- latest-centos7-cuda10.0 (same as latest-centos7-gpu)
- latest-centos7-cuda10.1
- latest-centos7-cuda10.2
Allow setting a computation type per device (e.g. Translator(..., compute_type={"cuda": "float16", "cpu": "int8"}) with the Python API)
[C++] Add ModelReader interface to customize model loading

Fixes and improvements

Optimize Transpose op on CPU for the permutation used in multi-head attention
Optimize GELU op on CPU with Intel MKL
Fix compilation when targeting an architecture and disabling ISA dispatch (e.g.: -DCMAKE_CXX_FLAGS="-march=skylake" -DENABLE_CPU_DISPATCH=OFF)
Inline some frequently called methods

Assets 2

Loading

All reactions

CTranslate2 1.11.0

29 Jun 16:02

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.11.0

New features

Add tokenization and detokenization hooks for file translation APIs
Add alternatives to Intel MKL:
- Integrate oneDNN for GEMM functions
- Implement vectorized operators that automatically select the instruction set architecture (ISA) (can be manually controlled with the CT2_FORCE_CPU_ISA environment variable)
When alternatives are available, avoid using Intel MKL on non Intel processors (can be manually controlled with the CT2_USE_MKL environment variable)
Enable a verbose mode with the environment variable CT2_VERBOSE=1 to help debugging the run configuration (e.g. the detected CPU, whether Intel MKL is being used, etc.)

Fixes and improvements

Improve numerical precision of SoftMax and LogSoftMax layers on CPU
Parallelize INT16 quantization/dequantization and ReLU on CPU
Add back the translation client in CentOS 7 Docker images

Assets 2

Loading

All reactions

CTranslate2 1.10.2

23 Jun 10:35

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.10.2

Fixes and improvements

[Python] Fix error when calling unload_model(to_cpu=True) for models with shared weights
[Python] Do not ignore errors when importing the compiled translator extension

Assets 2

Loading

All reactions

CTranslate2 1.10.1

25 May 15:50

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 1.10.1

Fixes and improvements

Force intra_threads to 1 when running a model on GPU to prevent high CPU load
Improve handling of decoding length constraints when using a target prefix
Do not raise an error when setting use_vmap but no vocabulary map exists

Assets 2

Loading

All reactions

Previous 1 2 … 8 9 10 11 12 13 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.