You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the GPT guide(https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#workflow),
Fig 2 shows fuseQKV masked attention, which looks very similar to Flash Attention. However, there's no longer any mention of fuseQKV masked attention or 'Flash Attention' in the text, so I'm wondering if it's the same technology as Flash Attention.
Am I understanding it correctly?
The text was updated successfully, but these errors were encountered:
likejazz
changed the title
fuseQKV masked attention same with Flash Attention?
Are fuseQKV masked attention and Flash Attention the same?
Feb 4, 2024
In the GPT guide(https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#workflow),
Fig 2 shows
fuseQKV masked attention
, which looks very similar to Flash Attention. However, there's no longer any mention offuseQKV masked attention
or 'Flash Attention' in the text, so I'm wondering if it's the same technology as Flash Attention.Am I understanding it correctly?
The text was updated successfully, but these errors were encountered: