Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention did not use mha kernel (muti head attention) on orin TRT8.6.10 #3575

Closed
Nusselder9 opened this issue Dec 28, 2023 · 16 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@Nusselder9
Copy link

Nusselder9 commented Dec 28, 2023

My model has an attention module like this:
image

It did not use mha kernel on my orin with TensorRT8.6.10 (os 6.0.7.0):
image

However, on x86 TensorRT 8.6.1, it can use mha kernel:
image (2)

I would like to use mha on orin. What can I do? Thanks!

@Nusselder9
Copy link
Author

To avoid uncertain influence, I export a toy attention with seq_length=128, batchsize=1, emd_dim=256 like this:
1

However, it still can NOT use mha kernel on orin TRT8.6.10:
image

Please give some help. Thanks very much!

@zerollzeng
Copy link
Collaborator

@nvpohanh I have a vague memory that this is expected(I've seen a internal bug before) that the fused mha kernel doens't enable in TRT 8.6, am I correct?

@zerollzeng zerollzeng self-assigned this Dec 30, 2023
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Dec 30, 2023
@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 1, 2024

@Nusselder9
Copy link
Author

Sorry but our business scenario won't be able to upgrade TRT within the near future.

Is there a way to fix or avoid the bug and enable fused mha on TRT 8.6? @zerollzeng @nvpohanh

thx!

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 2, 2024

Could you try adding a LayerNorm into the network? That will encourage TRT 8.6 to trigger the Transformer-specific fusions.

@Nusselder9
Copy link
Author

Thanks for your reply.@nvpohanh I have tried adding LN after attention, but it does not work.
image

image

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 2, 2024

@Nusselder9 could you share the ONNX with the LayerNorm? TRT 8.6 has quite restricted MHA pattern matching code and we need to find out why it didn't trigger the fusion. TRT 9.2 has much looser checking.

I would also try to make the MHA looks like:

[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] -> MatMul -> [B, N, S, S] -> MatMul -> [B, N, S, h] -Transpose-> [B, S, N, h] -Reshape-> [B, S, H] -LayerNorm->...
[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, h, S] ---^                           ^
[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] --------------------------------

where B=1, S=128, H=256, N=8, h=32

@Nusselder9
Copy link
Author

Thanks for your kind reply. The attachment is my attention.
attention.zip
The zip file has two type of attention and both of them can not use fmha.

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 3, 2024

@Nusselder9 Could you share the ONNX files with the LayerNorm? Thanks!

@Nusselder9
Copy link
Author

Here is attention with LN. @nvpohanh
attention_ln.zip

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 3, 2024

Filed internal tracker 4438093 . Will let you know if we have some findings. Thanks

@nvpohanh
Copy link
Collaborator

Internal investigation shows that TRT 8.6.10 did not have any MHA fusion support on Orin. Could you try TRT 8.6.11?

@Nusselder9
Copy link
Author

thanks, I will try.

@lix19937
Copy link

lix19937 commented Mar 25, 2024

@nvpohanh I think the question mha kernel is not inaccurate description.

Internal investigation shows that TRT 8.6.10 did not have any MHA fusion support on Orin. Could you try TRT 8.6.11?

If an onnx include standard transformer struct(like ViT decoder), TRT 8.6.11 can open MHA fusion ?

In my opinion, CustomQKVToContextPluginDynamic can do some fusion but it need match some conditions if user use plugin.

@XuDeshengCat
Copy link

Did TRT 8.6.13 have any MHA fusion support?

@Feynman1999
Copy link

Did TRT 9.2 have any MHA fusion support? and how to make them work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants