-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flash attention support softcap. #1013
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? |
Signed-off-by: zhanglei335 <[email protected]>
OK |
@@ -3286,6 +3293,7 @@ def forward( | |||
self.attention_dropout if self.training else 0.0, | |||
softmax_scale=self.softmax_scale, | |||
causal="causal" in attn_mask_type, | |||
softcap=self.softcap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be added to the fa_optional_forward_kwargs instead? Otherwise the previous versions of FA will complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, it is my fault
Signed-off-by: zhanglei335 <[email protected]>
Description
Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.
Fixes # (issue)
Type of change
Changes
add
softcap
args in Flashattention, and update_flash_attn_max_version
to 2.6.1Checklist: