[PyTorch] Normalization ops #1033

timmoon10 · 2024-07-22T22:37:32Z

Description

This PR extends the operation-based API (see #707) with LayerNorm, RMSNorm, and FP8 cast operations.

Compare with the existing module-based API:

# Module-based API
module1 = te.LayerNormLinear(...)

# Operation-based API
module2 = te.ops.Sequential(
    te.ops.LayerNorm(...),
    te.ops.Linear(...),
)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

LayerNorm operation
FP8 cast operation
RMSNorm operation

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-07-22T22:38:26Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-07-22T22:46:43Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-07-30T00:20:58Z

/te-ci pytorch

timmoon10 · 2024-07-30T22:45:14Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-08-12T23:09:04Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

transformer_engine/pytorch/module/rmsnorm.py

ptrendx · 2024-09-16T23:23:07Z

transformer_engine/pytorch/ops/basic/cast_float8.py

+from .._common import is_float8_tensor
+
+
+class CastFloat8(BasicOperation):


I believe you said that it is mostly an utility op for tests, right? We should probably mention that in this documentaiton.

Also maybe we should consider generalizing it a bit with a name that is not specific to FP8? (Like just Quantize)?

It could be a helpful op for users as well. For example, if a user wants to have discrete layers for design reasons but still wants to fuse some operations with FP8 casts:

act = te.ops.Sequential(te.ops.GeLU(), te.ops.Quantize()) linear = te.ops.Sequential(te.ops.Linear()) y = act(x) z = linear(y)

Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

Rename "CastFloat8" op to "Quantize". Add more fine-grained control for SM margin. Add docs for legacy sequence_parallel kwarg. Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-09-19T19:52:18Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-09-24T18:55:32Z

/te-ci pytorch

Edit: te-ci/docs failure disappears when job is rerun.

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-10-01T22:50:37Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-10-10T00:05:58Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-10-18T23:01:50Z

/te-ci pytorch

timmoon10 · 2024-11-05T17:29:01Z

/te-ci pytorch

timmoon10 · 2024-11-05T21:16:41Z

Merging with approval from @ptrendx and @ksivaman.

@ptrendx

* Add layer norm op Signed-off-by: Tim Moon <[email protected]> * Add FP8 cast op Signed-off-by: Tim Moon <[email protected]> * Add tests for linear and layernorm with FP8 output Signed-off-by: Tim Moon <[email protected]> * RMSNorm op Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Replace LayerNorm module with LayerNorm op Signed-off-by: Tim Moon <[email protected]> * Replace RMSNorm module with RMSNorm op Signed-off-by: Tim Moon <[email protected]> * Add AMP support Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Do not save autograd context if grad mode is disabled Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Forward args in pre_forward func to base op class Signed-off-by: Tim Moon <[email protected]> * Update to use QuantizedTensor class Signed-off-by: Tim Moon <[email protected]> * Apply suggestions from code review Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @ptrendx Rename "CastFloat8" op to "Quantize". Add more fine-grained control for SM margin. Add docs for legacy sequence_parallel kwarg. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Use weight dtype as default compute dtype Signed-off-by: Tim Moon <[email protected]> * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <[email protected]>

timmoon10 added 5 commits July 20, 2024 00:59

Add layer norm op

338e193

Signed-off-by: Tim Moon <[email protected]>

Add FP8 cast op

84bc1d7

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

0c40c54

Add tests for linear and layernorm with FP8 output

a7f0228

Signed-off-by: Tim Moon <[email protected]>

RMSNorm op

cb9c455

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added the enhancement New feature or request label Jul 22, 2024

timmoon10 requested review from sudhakarsingh27 and ksivaman July 22, 2024 22:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

68635ad

for more information, see https://pre-commit.ci

Fix linter warnings

cb9b4ec

Signed-off-by: Tim Moon <[email protected]>

timmoon10 and others added 7 commits July 24, 2024 21:30

Merge branch 'main' into norm-ops

b33f367

Merge branch 'main' into norm-ops

d9fb6f4

Replace LayerNorm module with LayerNorm op

00592d7

Signed-off-by: Tim Moon <[email protected]>

Replace RMSNorm module with RMSNorm op

e0a2fd9

Signed-off-by: Tim Moon <[email protected]>

Add AMP support

ad32d6a

Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

92d1f89

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

2197bca

Merge branch 'main' into norm-ops

c27a783

timmoon10 and others added 5 commits August 12, 2024 11:33

Merge branch 'main' into norm-ops

fb6b7e4

Signed-off-by: Tim Moon <[email protected]>

Do not save autograd context if grad mode is disabled

7be0524

Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

e7c9c67

[pre-commit.ci] auto fixes from pre-commit.com hooks

91e6a03

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

21086aa

timmoon10 added 2 commits August 15, 2024 18:55

Forward args in pre_forward func to base op class

28bc058

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

e6c5d5f

ptrendx reviewed Sep 16, 2024

View reviewed changes

transformer_engine/pytorch/module/rmsnorm.py Outdated Show resolved Hide resolved

ptrendx reviewed Sep 16, 2024

View reviewed changes

transformer_engine/pytorch/module/rmsnorm.py Outdated Show resolved Hide resolved

ptrendx reviewed Sep 16, 2024

View reviewed changes

transformer_engine/pytorch/module/rmsnorm.py Outdated Show resolved Hide resolved

ptrendx reviewed Sep 16, 2024

View reviewed changes

timmoon10 and others added 5 commits September 17, 2024 16:58

Apply suggestions from code review

fd5afe5

Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5b90e4b

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

fd4ef97

Review suggestions from @ptrendx

102c64f

Rename "CastFloat8" op to "Quantize". Add more fine-grained control for SM margin. Add docs for legacy sequence_parallel kwarg. Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

87ce450

for more information, see https://pre-commit.ci

Fix linter warnings

fed61f9

Signed-off-by: Tim Moon <[email protected]>

timmoon10 requested a review from ptrendx September 20, 2024 17:46

timmoon10 added 2 commits September 24, 2024 11:05

Merge branch 'main' into norm-ops

393ee66

Use weight dtype as default compute dtype

556983e

Signed-off-by: Tim Moon <[email protected]>

timmoon10 mentioned this pull request Sep 27, 2024

[PyTorch] Minor optimizations to reduce CPU overheads in modules #1191

Merged

13 tasks

Merge branch 'main' into norm-ops

9b508df

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

fb16ee9

Signed-off-by: Tim Moon <[email protected]>

timmoon10 mentioned this pull request Oct 17, 2024

Isn't the memory consumption should be dropped when using fp8? #1261

Open

timmoon10 and others added 3 commits October 18, 2024 15:50

Merge branch 'main' into norm-ops

34e2985

Signed-off-by: Tim Moon <[email protected]>

Fix linter warnings

7e61399

Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b3cf24

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

8ddd539

timmoon10 merged commit 77c37d4 into NVIDIA:main Nov 5, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Normalization ops #1033

[PyTorch] Normalization ops #1033

timmoon10 commented Jul 22, 2024

timmoon10 commented Jul 22, 2024

timmoon10 commented Jul 22, 2024

timmoon10 commented Jul 30, 2024

timmoon10 commented Jul 30, 2024

timmoon10 commented Aug 12, 2024

ptrendx Sep 16, 2024

ptrendx Sep 16, 2024

timmoon10 Sep 19, 2024

timmoon10 commented Sep 19, 2024

timmoon10 commented Sep 24, 2024 •

edited

Loading

timmoon10 commented Oct 1, 2024

timmoon10 commented Oct 10, 2024

timmoon10 commented Oct 18, 2024

timmoon10 commented Nov 5, 2024

timmoon10 commented Nov 5, 2024

		from .._common import is_float8_tensor


		class CastFloat8(BasicOperation):

[PyTorch] Normalization ops #1033

[PyTorch] Normalization ops #1033

Conversation

timmoon10 commented Jul 22, 2024

Description

Type of change

Changes

Checklist:

timmoon10 commented Jul 22, 2024

timmoon10 commented Jul 22, 2024

timmoon10 commented Jul 30, 2024

timmoon10 commented Jul 30, 2024

timmoon10 commented Aug 12, 2024

ptrendx Sep 16, 2024

Choose a reason for hiding this comment

ptrendx Sep 16, 2024

Choose a reason for hiding this comment

timmoon10 Sep 19, 2024

Choose a reason for hiding this comment

timmoon10 commented Sep 19, 2024

timmoon10 commented Sep 24, 2024 • edited Loading

timmoon10 commented Oct 1, 2024

timmoon10 commented Oct 10, 2024

timmoon10 commented Oct 18, 2024

timmoon10 commented Nov 5, 2024

timmoon10 commented Nov 5, 2024

timmoon10 commented Sep 24, 2024 •

edited

Loading