Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pytorch): Allow TransformerLayer and MultiheadAttention to accept sequence length parameters #1066

Merged
merged 8 commits into from
Aug 20, 2024

Conversation

hXl3s
Copy link
Contributor

@hXl3s hXl3s commented Jul 31, 2024

Description

TransformerLayer and MultiheadAttention does not allow for passing arbitrary length sequences. While this feature is supported by DotProductAttention it cannot be controlled by higher abstraction layers. This PR fixes that issue.

Additionally fixes runtime bug when thd layout is used

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Changes

Please list the changes introduced in this PR:

  • Added parameters cu_seqlen_q, cu_seqlens_kv, max_seqlen_q and max_seqlen_kv to MultiheadAttention and TransformerLayer
  • When using THD layout for attention input, shapes are handled correctly

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@ptrendx ptrendx requested a review from cyanguwa July 31, 2024 16:51
@ptrendx
Copy link
Member

ptrendx commented Jul 31, 2024

Hi @hXl3s please sign your commits. See https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details.

@ptrendx
Copy link
Member

ptrendx commented Aug 5, 2024

Could you also add an error when somebody tries to run THD in inference with KV-cache here: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/attention.py#L6732-L6735?
@sudhakarsingh27 FYI, since this will touch the pieces you are looking at for merging THD inference support.

@ptrendx
Copy link
Member

ptrendx commented Aug 5, 2024

Also @hXl3s could you add some unit test to make sure it works (e.g. comparing thd vs bshd outputs of TransformerLayer)?

@hXl3s
Copy link
Contributor Author

hXl3s commented Aug 12, 2024

@ptrendx Added to testcase comparing output of THD vs BSHD for float16 and bfloat16.
float32 is skipped as apparently cudnn or other implementation used does not support THD with float32

Additionally, as requested there is an assert now, that check if you do not use THD layout with KV-Cache during inference

@hXl3s
Copy link
Contributor Author

hXl3s commented Aug 13, 2024

Resolved all comments

Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ptrendx
Copy link
Member

ptrendx commented Aug 13, 2024

/te-ci pytorch

@ptrendx
Copy link
Member

ptrendx commented Aug 14, 2024

@hXl3s Could you skip the thd test when GPU SM arch is lower than 8.0 (as neither FlashAttention nor cuDNN support those).

@ptrendx ptrendx added the 1.10.0 label Aug 16, 2024
@ptrendx
Copy link
Member

ptrendx commented Aug 16, 2024

/te-ci pytorch

@ptrendx ptrendx merged commit 5d5fe81 into NVIDIA:main Aug 20, 2024
25 of 26 checks passed
ptrendx added a commit that referenced this pull request Aug 20, 2024
…t sequence length parameters (#1066)

* Added ability for seqlen for transformer and mha layer

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Documentation for new parameters

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Add tests for THD layout, assert for THD layout with KV-Cache

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Fixed tests

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move THD logic in shape calculation, add missing optional in params

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Skip the THD test on GPUs older than Ampere

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Lukasz Pierscieniewski <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Przemek Tredak <[email protected]>
BeingGod pushed a commit to BeingGod/TransformerEngine that referenced this pull request Aug 30, 2024
…t sequence length parameters (NVIDIA#1066)

* Added ability for seqlen for transformer and mha layer

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Documentation for new parameters

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Add tests for THD layout, assert for THD layout with KV-Cache

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Fixed tests

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move THD logic in shape calculation, add missing optional in params

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Skip the THD test on GPUs older than Ampere

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Lukasz Pierscieniewski <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Przemek Tredak <[email protected]>
Signed-off-by: beinggod <[email protected]>
ptrendx added a commit that referenced this pull request Aug 31, 2024
…t sequence length parameters (#1066)

* Added ability for seqlen for transformer and mha layer

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Documentation for new parameters

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Add tests for THD layout, assert for THD layout with KV-Cache

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Fixed tests

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move THD logic in shape calculation, add missing optional in params

Signed-off-by: Lukasz Pierscieniewski <[email protected]>

* Skip the THD test on GPUs older than Ampere

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Lukasz Pierscieniewski <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Przemek Tredak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants