-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention did not use mha kernel (muti head attention) on orin TRT8.6.10 #3575
Comments
@nvpohanh I have a vague memory that this is expected(I've seen a internal bug before) that the fused mha kernel doens't enable in TRT 8.6, am I correct? |
Is it possible to try TRT 9.2? https://github.com/NVIDIA/TensorRT/blob/release/9.2/docker/ubuntu-22.04.Dockerfile#L92-L95 |
Sorry but our business scenario won't be able to upgrade TRT within the near future. Is there a way to fix or avoid the bug and enable fused mha on TRT 8.6? @zerollzeng @nvpohanh thx! |
Could you try adding a LayerNorm into the network? That will encourage TRT 8.6 to trigger the Transformer-specific fusions. |
Thanks for your reply.@nvpohanh I have tried adding LN after attention, but it does not work. ![]() |
@Nusselder9 could you share the ONNX with the LayerNorm? TRT 8.6 has quite restricted MHA pattern matching code and we need to find out why it didn't trigger the fusion. TRT 9.2 has much looser checking. I would also try to make the MHA looks like:
where B=1, S=128, H=256, N=8, h=32 |
Thanks for your kind reply. The attachment is my attention. |
@Nusselder9 Could you share the ONNX files with the LayerNorm? Thanks! |
Here is attention with LN. @nvpohanh |
Filed internal tracker 4438093 . Will let you know if we have some findings. Thanks |
Internal investigation shows that TRT 8.6.10 did not have any MHA fusion support on Orin. Could you try TRT 8.6.11? |
thanks, I will try. |
@nvpohanh I think the question
If an onnx include standard transformer struct(like ViT decoder), TRT 8.6.11 can open MHA fusion ? In my opinion, |
Did TRT 8.6.13 have any MHA fusion support? |
Did TRT 9.2 have any MHA fusion support? and how to make them work |
My model has an attention module like this:

It did not use mha kernel on my orin with TensorRT8.6.10 (os 6.0.7.0):

However, on x86 TensorRT 8.6.1, it can use mha kernel:

I would like to use mha on orin. What can I do? Thanks!
The text was updated successfully, but these errors were encountered: