[PyTorch] Miscellaneous fixes for FA3 attention #1174

cyanguwa · 2024-09-10T00:28:25Z

Description

This PR makes a few changes to the FA3 attention path.

Adds descale_q, descale_k and descale_v to FA3 FP8 call. This allows for custom descaling factors instead of the default 1s for q, k and v. This requires FA3 PR 1210 to be in your FA3 installation.
Restricts FA3 path to flash_attn_func for FP8 since flash_attn_varlen_func does not support FP8 yet.
Fixes the transposes in the qkv_format=sbhd case when fp8_mha=true.
Enables sliding window support for FP16/BF16 from FA3 (no FP8 support yet).
Improves the messaging when FA3 is not installed, and when it's missing some kwargs. We provide some installation instructions, and remind users to update their installation, in those two cases. This PR targets FA 3.0.0b1, and it's rapidly developing, with more kwargs being added to the API.
Casts FA3 output to the same type as cuDNN attention in unit tests.
Enables INFO level messaging in CI test to help check correct backends are used for different tests.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

See description above.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-09-10T00:29:58Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

transformer_engine/pytorch/attention.py

cyanguwa · 2024-09-17T23:46:03Z

/te-ci pytorch

cyanguwa · 2024-09-17T23:46:17Z

FA3: pipeline 18489052

Signed-off-by: Charlene Yang <[email protected]>

This reverts commit 19e7f87. Signed-off-by: Charlene Yang <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-09-19T00:35:49Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2024-09-19T01:00:51Z

/te-ci pytorch

cyanguwa · 2024-09-19T01:02:39Z

FA3 pipeline 18528978

transformer_engine/pytorch/attention.py

cyanguwa · 2024-10-01T20:25:29Z

/te-ci pytorch

xrennvidia · 2024-10-02T03:49:54Z

LGTM. Only one small question or comment: seems like FA3 only can support FP8 with BSHD/SBHD format, THD format is not supported with FP8. Should we added an assert message for this in TE? Anyway, this will finally trigger error in Tri Dao's code, but I think better to tell users this in TE also.

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-10-03T20:49:15Z

/te-ci pytorch

cyanguwa · 2024-10-03T20:51:09Z

FA3: 19002912

cyanguwa · 2024-10-03T21:17:28Z

@xrennvidia do you mind taking another look at the PR? I made a couple more changes after your last review. Thanks!

transformer_engine/pytorch/attention.py

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2024-10-08T00:08:42Z

/te-ci pytorch

cyanguwa · 2024-10-08T00:20:01Z

FA3: 19123312

cyanguwa and others added 2 commits September 9, 2024 17:23

add qkv descales to FA3

bcdc4d1

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3ed49d0

for more information, see https://pre-commit.ci

cyanguwa marked this pull request as ready for review September 10, 2024 17:24

cyanguwa and others added 2 commits September 17, 2024 13:51

fix sbhd shapes

1db61e2

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a86660

for more information, see https://pre-commit.ci

ptrendx reviewed Sep 17, 2024

View reviewed changes

transformer_engine/pytorch/attention.py Outdated Show resolved Hide resolved

Merge branch 'main' into add_descales

7da4b6c

cyanguwa added 5 commits September 18, 2024 16:37

Merge branch 'main' into add_descales

de3db0a

force the same dtype when comparing FA3 and cuDNN FP8

19e7f87

Signed-off-by: Charlene Yang <[email protected]>

Revert "force the same dtype when comparing FA3 and cuDNN FP8"

bff80b6

This reverts commit 19e7f87. Signed-off-by: Charlene Yang <[email protected]>

force the same dtype when comparing FA3 and cuDNN FP8

68b9b48

Signed-off-by: Charlene Yang <[email protected]>

add try/except for FA3 when custom qkv descales are not supported

0553a83

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa changed the title ~~[PyTorch] Add qkv descales to FA3~~ [PyTorch] Miscellaneous fixes for FlashAttention with FA3 Sep 18, 2024

cyanguwa changed the title ~~[PyTorch] Miscellaneous fixes for FlashAttention with FA3~~ [PyTorch] Miscellaneous fixes for FA3 FP8 attention Sep 18, 2024

cyanguwa and others added 2 commits September 18, 2024 17:24

replace FA3 installation warning with a debug logging message

b73760b

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

66cc6f2

for more information, see https://pre-commit.ci

cyanguwa and others added 3 commits September 18, 2024 17:37

fix lint

3269685

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

39a4e1d

for more information, see https://pre-commit.ci

Merge branch 'main' into add_descales

5bcc355

Signed-off-by: Charlene Yang <[email protected]>

ptrendx reviewed Sep 27, 2024

View reviewed changes

transformer_engine/pytorch/attention.py Show resolved Hide resolved

Merge branch 'main' into add_descales

3dbee25

cyanguwa requested a review from xrennvidia October 1, 2024 18:15

Merge branch 'NVIDIA:main' into add_descales

c01a5b2

xrennvidia approved these changes Oct 2, 2024

View reviewed changes

cyanguwa and others added 7 commits October 3, 2024 09:22

Merge branch 'main' into add_descales

12dc8a9

remove unused imports

336a452

Signed-off-by: Charlene Yang <[email protected]>

avoid varlen_func for FP8 and improve messaging

2e140c5

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1138edf

for more information, see https://pre-commit.ci

add SWA support for FA3

4095be8

Signed-off-by: Charlene Yang <[email protected]>

fix lint

8e2bcc2

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

7bf4936

for more information, see https://pre-commit.ci

cyanguwa changed the title ~~[PyTorch] Miscellaneous fixes for FA3 FP8 attention~~ [PyTorch] Miscellaneous fixes for FA3 attention Oct 3, 2024

cyanguwa requested a review from xrennvidia October 3, 2024 21:16

xrennvidia reviewed Oct 3, 2024

View reviewed changes

transformer_engine/pytorch/attention.py Outdated Show resolved Hide resolved

transformer_engine/pytorch/attention.py Outdated Show resolved Hide resolved

cyanguwa and others added 5 commits October 6, 2024 11:56

change preference reason for FP8 logic

b765f3d

Signed-off-by: Charlene Yang <[email protected]>

minor fixes

a4030e8

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

569532a

for more information, see https://pre-commit.ci

Merge branch 'main' into add_descales

e907ad7

minor fix

f006a25

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa merged commit e762592 into NVIDIA:main Oct 8, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Miscellaneous fixes for FA3 attention #1174

[PyTorch] Miscellaneous fixes for FA3 attention #1174

cyanguwa commented Sep 10, 2024 •

edited

Loading

cyanguwa commented Sep 10, 2024

cyanguwa commented Sep 17, 2024

cyanguwa commented Sep 17, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Oct 1, 2024

xrennvidia commented Oct 2, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 8, 2024

cyanguwa commented Oct 8, 2024

[PyTorch] Miscellaneous fixes for FA3 attention #1174

[PyTorch] Miscellaneous fixes for FA3 attention #1174

Conversation

cyanguwa commented Sep 10, 2024 • edited Loading

Description

Type of change

Changes

Checklist:

cyanguwa commented Sep 10, 2024

cyanguwa commented Sep 17, 2024

cyanguwa commented Sep 17, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Sep 19, 2024

cyanguwa commented Oct 1, 2024

xrennvidia commented Oct 2, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 3, 2024

cyanguwa commented Oct 8, 2024

cyanguwa commented Oct 8, 2024

cyanguwa commented Sep 10, 2024 •

edited

Loading