Update on the development branch #1456

kaiyux · 2024-04-16T11:48:38Z

kaiyux
Apr 16, 2024
Maintainer

Hi,

The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this April 16th, 2024.

This update includes:

Features
- Support pipeline parallelism for GPT
API
- [BREAKING CHANGE] Migrate enc-dec models to the unified workflow
Bug fixes
- Fix segmentation fault with pipeline parallelism and gather_all_token_logits Segmentation fault with pipeline parallelism and gather_all_token_logits #1284
- Remove the unnecessary check in XQA to fix Code Llama 70b triton crashes Code Llama 70b triton crashes with XQA #1256
- Fix an unsupported ScalarType issue for BF16 LoRA Support bfloat16 LoRa Adaptors triton-inference-server/tensorrtllm_backend#403
- Eliminate the load and save of prompt table in multimodal why is the `prompt_table` in ModelRunner.generate passed in as npy file instead of a tensor ? #1436
Performance
- Optimize applyBiasRopeUpdateKVCache kernel by avoiding re-computation

Thanks,
The TensorRT-LLM Engineering Team

qiaoxj07 · 2024-04-16T12:01:21Z

qiaoxj07
Apr 16, 2024
Collaborator

Hi, I saw the warning "FP8 Context Paged KV FMHA hasn't been implemented yet." in gptAttentionCommon.cpp is removed, so does "paged_context_fmha" support fp8 in this update?

3 replies

PerkzZheng Apr 24, 2024
Collaborator

yes, it has been supported (need fp8 quantization workflow). See the doc here (https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/advanced/gpt-attention.md#fp8-context-fmha).

qiaoxj07 Apr 24, 2024
Collaborator

Thanks for your reply.
I tired it on L40S, currently paged_context_fmha requires fp8-contex-fmha, and fp8-context-fmha is only supported on Hopper, is there any plan to support fp8-context-fmha on Ada?

PerkzZheng Apr 24, 2024
Collaborator

we are working on that, but there is no concrete date when we will support it. I will let you know when it is scheduled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update on the development branch #1456

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Update on the development branch #1456

Uh oh!

kaiyux Apr 16, 2024 Maintainer

Replies: 1 comment · 3 replies

Uh oh!

qiaoxj07 Apr 16, 2024 Collaborator

Uh oh!

PerkzZheng Apr 24, 2024 Collaborator

Uh oh!

qiaoxj07 Apr 24, 2024 Collaborator

Uh oh!

PerkzZheng Apr 24, 2024 Collaborator

kaiyux
Apr 16, 2024
Maintainer

Replies: 1 comment 3 replies

qiaoxj07
Apr 16, 2024
Collaborator

PerkzZheng Apr 24, 2024
Collaborator

qiaoxj07 Apr 24, 2024
Collaborator

PerkzZheng Apr 24, 2024
Collaborator