Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash Attention v3 #36190

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Flash Attention v3 #36190

wants to merge 9 commits into from

Conversation

hlky
Copy link
Contributor

@hlky hlky commented Feb 14, 2025

What does this PR do?

Replaces #33522 to avoid conflicts and allow those using it to continue while we get it updated for #35235

Initial commit of this PR adds auxiliary code so we can discuss the core FAv3 integration.

cc @ArthurZucker

  • Integrate FAv3 into _flash_attention_forward/flash_attention_forward as before or create new functions?
  • Some models still have FlashAttention2 classes, is refactoring all models to use the new style planned? Integrate FAv3 as before or do the refactor in this PR?

Also to check:

  • Status of dropout, softcap etc
  • Status of FP8
  • Packaging

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a preheader to warn/inform you on some stuff regarding the current status of fa3:

if torch.version.cuda:
compute_capability = torch.cuda.get_device_capability()
major, _ = compute_capability
if major < 9:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A100 support has been recently added Dao-AILab/flash-attention#1481 (comment)

@vasqu
Copy link
Contributor

vasqu commented Feb 14, 2025

cc @bn999 if you're interested about the progress

@bn999
Copy link

bn999 commented Feb 14, 2025

@vasqu Yup, I'm following. Good stuff.

@hlky
Copy link
Contributor Author

hlky commented Feb 18, 2025

Thanks for the info @vasqu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants