You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expose translator option max_queued_batches to configure the maximum number of queued batches (when the queue is full, future requests will block until a free slot is available)
Allow converters to customize the vocabulary special tokens <unk>, <s>, and </s>
Fixes and improvements
Fix compatibility of models converted on Windows with other platforms by saving the vocabulary files with the newline character "\n" instead of "\r\n"
Clarify conversion error when no TensorFlow checkpoints are found in the configured model directory
Enable fused QKV transposition by switching the heads and time dimensions before the QKV split
Cache the prepared source lengths mask in the Transformer decoder state and reuse it in the next decoding steps
Pad the output layer to enable Tensor Cores only once instead of updating the layer on each batch
Vectorize copy in Concat and Split ops on GPU
Factorize all OpenMP parallel for loops to call the parallel_for function
Compile CUDA kernels for deprecated Compute Capabilities that are not yet dropped by CUDA: