Update on the development branch #1456
kaiyux
announced in
Announcements
Replies: 1 comment 3 replies
-
Hi, I saw the warning "FP8 Context Paged KV FMHA hasn't been implemented yet." in gptAttentionCommon.cpp is removed, so does "paged_context_fmha" support fp8 in this update? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this April 16th, 2024.
This update includes:
gather_all_token_logits
Segmentation fault with pipeline parallelism andgather_all_token_logits
#1284applyBiasRopeUpdateKVCache
kernel by avoiding re-computationThanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions