Weekly release: 0.19.0rc0 #3588
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we have updated a weekly release
0.19.0rc0
, and pushed an update to the Triton backend this April 15, 2025.The
0.19.0rc0
dev release includes:examples/gemma/README.md
. (feat: Support gemma-3-1b-it #3247)ENABLE_MULTI_DEVICE
andENABLE_UCX
as CMake options (feat: register ENABLE_MULTI_DEVICE and ENABLE_UCX as CMake options #3343)PyExecutor
inference flow to estimatemax_num_tokens
forkv_cache_manager
(feat: Run PyExecutor's inference flow to estimate max_num_tokens for kv_cache_manager #3092)TLLM_OVERRIDE_LAYER_NUM
andTLLM_TRACE_MODEL_FORWARD
environment variables for debugging (feat: Support TLLM_OVERRIDE_LAYER_NUM and TLLM_TRACE_MODEL_FORWARD for debugging #3417)AutoTuner
to both Fused MoE and NVFP4 Linear operators (feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. #3151)UserBuffers
allocator for PyTorch flow (feat: Introduce UB allocator for pytorch flow #3257)init.py
(feat: Enhance the integrated robustness of scaffolding with __init__.… #3312)numNodes
toParallelConfig
(feat: Add numNodes to ParallelConfig #3346)KvCacheConfig
inexamples/gpqa_llmapi.py
(feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… #3369)max_seq_len
inexecutor_config
(fix: fix max_seq_len in executor_config #3487)context_and_generation
request type in disaggregated overlap (fix: Allow context_and_generation request type in disagg overlap #3489)py_decoding_iter
update in the decoder (fix: fix the py_decoding_iter update in decoder #3297)FP4Linear
(fix [NVBUG 5208255] Fix missing bias add for FP4Linear. #3361)test_deepseek_allreduce.py
(fix: runtime error in test_deepseek_allreduce.py #3226)PyExecutor
and improved TP support (Fix torch nvsmall through pyexecutor and fix its TP support #3238)The cut-off commit to this release is 258ae9c. The code changes can be seen here: 5aeef6d...258ae9c.
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions