[JAX] Context Parallel Attention with All-Gather #1106

mgoldfarb-nvidia · 2024-08-14T22:00:11Z

Description

Adds support for context parallel fused attention based on an all-gather/reduce-scatter approach. This implementation exposes the collective communication between CP ranks.

The first implementation of CP only support causal and no masking without bias. Additional QKV formats and configurations will be added to subsequent PRs.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Adds context parallel axis resource and new primitives to fused attention.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

ptrendx · 2024-08-15T00:31:41Z

How is this different from #1059?

mgoldfarb-nvidia · 2024-08-15T00:40:38Z

The other method is implemented as ring attention with point to point comms. Both will ultimately provide context parallel attention but some forms of attention e.g window attention are more easily supported with ag approach. We also may get better scaling of the ag comms on multi node setups. Michael Goldfarb Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Przemyslaw Tredak ***@***.***> Sent: Wednesday, August 14, 2024 7:32:03 PM To: NVIDIA/TransformerEngine ***@***.***> Cc: Michael Goldfarb ***@***.***>; Author ***@***.***> Subject: Re: [NVIDIA/TransformerEngine] [JAX] DRAFT: Context Parallel Attention with All-Gather (PR #1106) How is this different from #1059<#1059>? — Reply to this email directly, view it on GitHub<#1106 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BI2EORL7SERQM3OUS62GNU3ZRPZIHAVCNFSM6AAAAABMRF7F5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJQGE2TEMBTGM>. You are receiving this because you authored the thread.Message ID: ***@***.***>

ptrendx · 2024-08-16T16:21:54Z

Hmmm, sure, but it still feels like duplicate of the functionality. Could you maybe collaborate with @mingxu1067 to merge your work with his?

mgoldfarb-nvidia · 2024-08-18T14:39:58Z

Sure thing. Ming and I have already been in discussion on how to merge the PRs. Likely there should be an initial CP attention PR and will need to follow with updated to other components on the jax side as we implement the rest of the CP features for jax.

mgoldfarb-nvidia · 2024-08-23T12:54:46Z

transformer_engine/jax/cpp_extensions/attention.py

+
+register_primitive(FusedAttnCPWithAllGatherBwdPrimitive)
+
+
 def fused_attn_fwd(


This is an important discussion point: Do we keep CP hidden behind a common function or add a separate flavor of fused_attn_cp_allgather_fwd. The though here was it makes sense to keep a common interface that naturally support CP. Other implementation strategies e.g. ring can be exposed via argument.

transformer_engine/jax/sharding.py

transformer_engine/jax/attention.py

tests/jax/test_distributed_fused_attn.py

transformer_engine/jax/cpp_extensions/attention.py

tests/jax/test_distributed_fused_attn.py

transformer_engine/jax/cpp_extensions/attention.py

zlsh80826

LGTM! Thanks for the contributions, especially the _FusedAttnConfig, that's quite simplify the argument passing.

zlsh80826 · 2024-09-06T03:35:07Z

Please help check the pre-commit failure. You can run pre-commit install in your working directory, then it will check when each time you commit.

mgoldfarb-nvidia · 2024-09-09T23:11:01Z

/te-ci jax

mgoldfarb-nvidia · 2024-09-09T23:21:40Z

/te-ci jax

ptrendx · 2024-09-10T01:35:02Z

/te-ci jax

Signed-off-by: Michael Goldfarb <[email protected]>

mingxu1067 · 2024-09-16T15:26:10Z

/te-ci jax

mgoldfarb-nvidia changed the title ~~[JAX] WIP: Context Parallel Attention with All-Gather~~ [JAX] DRAFT: Context Parallel Attention with All-Gather Aug 14, 2024

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch 2 times, most recently from 4cd496a to 050bce8 Compare August 21, 2024 21:10

mgoldfarb-nvidia changed the title ~~[JAX] DRAFT: Context Parallel Attention with All-Gather~~ [JAX] Context Parallel Attention with All-Gather Aug 22, 2024

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch 2 times, most recently from 43a8d78 to 988e3f6 Compare August 22, 2024 21:30

mgoldfarb-nvidia commented Aug 23, 2024

View reviewed changes

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch from 988e3f6 to e912b64 Compare August 23, 2024 13:27

zlsh80826 requested review from zlsh80826 and mingxu1067 August 23, 2024 14:15

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch from e912b64 to 03e34c0 Compare August 27, 2024 23:03

zlsh80826 reviewed Aug 28, 2024

View reviewed changes

transformer_engine/jax/sharding.py Outdated Show resolved Hide resolved

transformer_engine/jax/attention.py Outdated Show resolved Hide resolved

transformer_engine/jax/attention.py Show resolved Hide resolved

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch 5 times, most recently from a83875c to df5267a Compare September 2, 2024 16:18

zlsh80826 reviewed Sep 5, 2024

View reviewed changes

transformer_engine/jax/cpp_extensions/attention.py Show resolved Hide resolved

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch 2 times, most recently from ba88ede to d3c9d06 Compare September 5, 2024 22:34

zlsh80826 self-requested a review September 6, 2024 03:31

zlsh80826 approved these changes Sep 6, 2024

View reviewed changes

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch from 5c6bc6c to 83a62b4 Compare September 9, 2024 21:29

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch from 83a62b4 to 0691181 Compare September 9, 2024 23:20

Implementation of context parallel fused attention using all-gather.

5194eb4

Signed-off-by: Michael Goldfarb <[email protected]>

mgoldfarb-nvidia force-pushed the mgoldfarb-nvidia/context_parallel_attention_with_all_gather branch from 0691181 to 5194eb4 Compare September 16, 2024 15:23

mgoldfarb-nvidia merged commit 9101a78 into NVIDIA:main Sep 17, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Context Parallel Attention with All-Gather #1106

[JAX] Context Parallel Attention with All-Gather #1106

mgoldfarb-nvidia commented Aug 14, 2024 •

edited

Loading

ptrendx commented Aug 15, 2024

mgoldfarb-nvidia commented Aug 15, 2024 via email

ptrendx commented Aug 16, 2024

mgoldfarb-nvidia commented Aug 18, 2024

mgoldfarb-nvidia Aug 23, 2024

zlsh80826 left a comment •

edited

Loading

zlsh80826 commented Sep 6, 2024

mgoldfarb-nvidia commented Sep 9, 2024

mgoldfarb-nvidia commented Sep 9, 2024

ptrendx commented Sep 10, 2024

mingxu1067 commented Sep 16, 2024


		register_primitive(FusedAttnCPWithAllGatherBwdPrimitive)


		def fused_attn_fwd(

[JAX] Context Parallel Attention with All-Gather #1106

[JAX] Context Parallel Attention with All-Gather #1106

Conversation

mgoldfarb-nvidia commented Aug 14, 2024 • edited Loading

Description

Type of change

Changes

Checklist:

ptrendx commented Aug 15, 2024

mgoldfarb-nvidia commented Aug 15, 2024 via email

ptrendx commented Aug 16, 2024

mgoldfarb-nvidia commented Aug 18, 2024

mgoldfarb-nvidia Aug 23, 2024

Choose a reason for hiding this comment

zlsh80826 left a comment • edited Loading

Choose a reason for hiding this comment

zlsh80826 commented Sep 6, 2024

mgoldfarb-nvidia commented Sep 9, 2024

mgoldfarb-nvidia commented Sep 9, 2024

ptrendx commented Sep 10, 2024

mingxu1067 commented Sep 16, 2024

mgoldfarb-nvidia commented Aug 14, 2024 •

edited

Loading

zlsh80826 left a comment •

edited

Loading