Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash attention support softcap. #1013

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

Lzhang-hub
Copy link

Description

Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Changes

add softcap args in Flashattention, and update _flash_attn_max_version to 2.6.1

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

@ptrendx
Copy link
Member

ptrendx commented Jul 16, 2024

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low?
Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

Signed-off-by: zhanglei335 <[email protected]>
@Lzhang-hub
Copy link
Author

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

OK

@@ -3286,6 +3293,7 @@ def forward(
self.attention_dropout if self.training else 0.0,
softmax_scale=self.softmax_scale,
causal="causal" in attn_mask_type,
softcap=self.softcap,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be added to the fa_optional_forward_kwargs instead? Otherwise the previous versions of FA will complain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, it is my fault

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants