Flash attention support softcap. #1013

Lzhang-hub · 2024-07-14T02:24:55Z

Description

Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)

Changes

add softcap args in Flashattention, and update _flash_attn_max_version to 2.6.1

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works

for more information, see https://pre-commit.ci

ptrendx · 2024-07-16T18:45:02Z

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low?
Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

Signed-off-by: zhanglei335 <[email protected]>

Lzhang-hub · 2024-07-17T07:25:06Z

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

OK

ptrendx · 2024-08-06T18:31:10Z

transformer_engine/pytorch/attention.py

@@ -3286,6 +3293,7 @@ def forward(
                    self.attention_dropout if self.training else 0.0,
                    softmax_scale=self.softmax_scale,
                    causal="causal" in attn_mask_type,
+                    softcap=self.softcap,


Shouldn't it be added to the fa_optional_forward_kwargs instead? Otherwise the previous versions of FA will complain.

you are right, it is my fault

Signed-off-by: zhanglei335 <[email protected]>

Lzhang-hub and others added 8 commits July 13, 2024 16:48

support softcap for flash-attn

34adf28

update flash-attn max version

f6cd21c

delete noused code

7050f7b

update flash attn verion in setup

36face8

no log attention backend

cfbc218

[pre-commit.ci] auto fixes from pre-commit.com hooks

cfe16c3

for more information, see https://pre-commit.ci

back

57d05ac

Merge branch 'main' of https://github.com/Lzhang-hub/TransformerEngine

5b75edc

check flash-attn for softcap

61c5a94

Signed-off-by: zhanglei335 <[email protected]>

Merge branch 'main' into main

28df6cc

ptrendx reviewed Aug 6, 2024

View reviewed changes

softcap added to fa_optional_forward_kwargs

7851ab3

Signed-off-by: zhanglei335 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash attention support softcap. #1013

Flash attention support softcap. #1013

Lzhang-hub commented Jul 14, 2024

ptrendx commented Jul 16, 2024

Lzhang-hub commented Jul 17, 2024

ptrendx Aug 6, 2024

Lzhang-hub Aug 7, 2024

Flash attention support softcap. #1013

Are you sure you want to change the base?

Flash attention support softcap. #1013

Conversation

Lzhang-hub commented Jul 14, 2024

Description

Type of change

Changes

Checklist:

ptrendx commented Jul 16, 2024

Lzhang-hub commented Jul 17, 2024

ptrendx Aug 6, 2024

Choose a reason for hiding this comment

Lzhang-hub Aug 7, 2024

Choose a reason for hiding this comment