TensorRT-LLM Release 0.17.0 #2726
Pinned
zeroepoch
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We are very pleased to announce the 0.17.0 version of TensorRT-LLM. This update includes:
Model Support
examples/multimodal/README.md
.Features
Blackwell support
LLM
API andtrtllm-bench
command.PyTorch workflow
tensorrt_llm._torch
. The following is a list of supported infrastructure, models, and features that can be used with the PyTorch workflow.Added FP8 context FMHA support for the W4A8 quantization workflow.
Added ModelOpt quantized checkpoint support for the
LLM
API.Added support for
min_p
. Refer to https://arxiv.org/pdf/2407.01082.Added FP8 support for encoder-decoder models. Refer to the “FP8 Post-Training Quantization” section in
examples/enc_dec/README.md
.Added up and gate projection fusion support for LoRA modules.
API
paged_context_fmha
andfp8_context_fmha
are enabled by default.paged_context_fmha
is enabled.tokens_per_block
is set to 32 by default.--concurrency
support for thethroughput
subcommand oftrtllm-bench
.Bug fixes
cluster_key
for auto parallelism feature. ([feature request] Can we add H200 in infer_cluster_key() method? #2552)Beta Was this translation helpful? Give feedback.
All reactions