[PyTorch] Normalization ops #1033

timmoon10 · 2024-07-22T22:37:32Z

Description

This PR extends the operation-based API (see #707) with LayerNorm, RMSNorm, and FP8 cast operations.

Compare with the existing module-based API:

# Module-based API
module1 = te.LayerNormLinear(...)

# Operation-based API
module2 = te.ops.Sequential(
    te.ops.LayerNorm(...),
    te.ops.Linear(...),
)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

LayerNorm operation
FP8 cast operation
RMSNorm operation

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-07-22T22:38:26Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-07-22T22:46:43Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-07-30T00:20:58Z

/te-ci pytorch

timmoon10 · 2024-07-30T22:45:14Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-08-12T23:09:04Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-09-19T19:52:18Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-09-24T18:55:32Z

/te-ci pytorch

Edit: te-ci/docs failure disappears when job is rerun.

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-10-01T22:50:37Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-10-10T00:05:58Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2024-10-18T23:01:50Z

/te-ci pytorch

timmoon10 · 2024-11-05T17:29:01Z

/te-ci pytorch

timmoon10 · 2024-11-05T21:16:41Z

Merging with approval from @ptrendx and @ksivaman.

@ptrendx

* Add layer norm op Signed-off-by: Tim Moon <[email protected]> * Add FP8 cast op Signed-off-by: Tim Moon <[email protected]> * Add tests for linear and layernorm with FP8 output Signed-off-by: Tim Moon <[email protected]> * RMSNorm op Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Replace LayerNorm module with LayerNorm op Signed-off-by: Tim Moon <[email protected]> * Replace RMSNorm module with RMSNorm op Signed-off-by: Tim Moon <[email protected]> * Add AMP support Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Do not save autograd context if grad mode is disabled Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Forward args in pre_forward func to base op class Signed-off-by: Tim Moon <[email protected]> * Update to use QuantizedTensor class Signed-off-by: Tim Moon <[email protected]> * Apply suggestions from code review Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @ptrendx Rename "CastFloat8" op to "Quantize". Add more fine-grained control for SM margin. Add docs for legacy sequence_parallel kwarg. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Use weight dtype as default compute dtype Signed-off-by: Tim Moon <[email protected]> * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <[email protected]>

binxuan · 2024-12-19T20:25:10Z

Hi one question regarding this new layernorm implementation, my understanding is that this new implementation can support multi-dimension layernorm weight while previous implemenation can only support one-dimension weight.

So I have N different 1-D tensors and previously I have to initiate N different layernorm and apply them separately. But with this new implementation, then we can apply one 2-dimensional layernorm to N stacked different tensors with shape (N, dim)? is my understanding correct?

timmoon10 · 2025-02-28T19:40:08Z

@binxuan This implementation matches torch.nn.LayerNorm:

x_2d = x.reshape(-1, prod(normalized_shape))
y_2d = layer_norm_2d(x_2d, weight.reshape(-1), bias.reshape(-1))
y = y_2d.reshape(x.size())

timmoon10 added 5 commits July 20, 2024 00:59

Add layer norm op

338e193

Signed-off-by: Tim Moon <[email protected]>

Add FP8 cast op

84bc1d7

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

0c40c54

Add tests for linear and layernorm with FP8 output

a7f0228

Signed-off-by: Tim Moon <[email protected]>

RMSNorm op

cb9c455

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added the enhancement New feature or request label Jul 22, 2024

timmoon10 requested review from sudhakarsingh27 and ksivaman July 22, 2024 22:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

68635ad

for more information, see https://pre-commit.ci

Fix linter warnings

cb9b4ec

Signed-off-by: Tim Moon <[email protected]>

timmoon10 and others added 7 commits July 24, 2024 21:30

Merge branch 'main' into norm-ops

b33f367

Merge branch 'main' into norm-ops

d9fb6f4

Replace LayerNorm module with LayerNorm op

00592d7

Signed-off-by: Tim Moon <[email protected]>

Replace RMSNorm module with RMSNorm op

e0a2fd9

Signed-off-by: Tim Moon <[email protected]>

Add AMP support

ad32d6a

Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

92d1f89

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

2197bca

Merge branch 'main' into norm-ops

c27a783

timmoon10 and others added 5 commits August 12, 2024 11:33

Merge branch 'main' into norm-ops

fb6b7e4

Signed-off-by: Tim Moon <[email protected]>

Do not save autograd context if grad mode is disabled

7be0524

Debugging ONNX export tests. Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

e7c9c67

[pre-commit.ci] auto fixes from pre-commit.com hooks

91e6a03

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

21086aa

timmoon10 added 2 commits August 15, 2024 18:55

Forward args in pre_forward func to base op class

28bc058

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

e6c5d5f

[pre-commit.ci] auto fixes from pre-commit.com hooks

87ce450

for more information, see https://pre-commit.ci

Fix linter warnings

fed61f9

Signed-off-by: Tim Moon <[email protected]>

timmoon10 requested a review from ptrendx September 20, 2024 17:46

timmoon10 added 2 commits September 24, 2024 11:05

Merge branch 'main' into norm-ops

393ee66

Use weight dtype as default compute dtype

556983e

Signed-off-by: Tim Moon <[email protected]>

timmoon10 mentioned this pull request Sep 27, 2024

[PyTorch] Minor optimizations to reduce CPU overheads in modules #1191

Merged

13 tasks

Merge branch 'main' into norm-ops

9b508df

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into norm-ops

fb16ee9

Signed-off-by: Tim Moon <[email protected]>

timmoon10 mentioned this pull request Oct 17, 2024

Isn't the memory consumption should be dropped when using fp8? #1261

Open

timmoon10 and others added 3 commits October 18, 2024 15:50

Merge branch 'main' into norm-ops

34e2985

Signed-off-by: Tim Moon <[email protected]>

Fix linter warnings

7e61399

Signed-off-by: Tim Moon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b3cf24

for more information, see https://pre-commit.ci

Merge branch 'main' into norm-ops

8ddd539

timmoon10 merged commit 77c37d4 into NVIDIA:main Nov 5, 2024
26 checks passed

This was referenced Nov 13, 2024

[PyTorch] Integration test for Megatron-LM #1329

Merged

[BUG]Megatron-LM doesn't support transformer-engine 1.13 NVIDIA/Megatron-LM#1280

Open

timmoon10 mentioned this pull request Feb 6, 2025

Configure FSDP to keep module params NVIDIA/NeMo#12074

Merged

8 tasks

timmoon10 mentioned this pull request Feb 28, 2025

[PyTorch] Set flags in norm modules for Mcore sequence-parallel support #1528

Merged

13 tasks

timmoon10 deleted the norm-ops branch February 28, 2025 19:33

timmoon10 mentioned this pull request Jun 26, 2025

[PyTorch] Tests for loading previously-generated checkpoints #1899

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Normalization ops #1033

[PyTorch] Normalization ops #1033

Uh oh!

timmoon10 commented Jul 22, 2024

Uh oh!

timmoon10 commented Jul 22, 2024

Uh oh!

timmoon10 commented Jul 22, 2024

Uh oh!

timmoon10 commented Jul 30, 2024

Uh oh!

timmoon10 commented Jul 30, 2024

Uh oh!

timmoon10 commented Aug 12, 2024

Uh oh!

timmoon10 commented Sep 19, 2024

Uh oh!

timmoon10 commented Sep 24, 2024 •

edited

Loading

Uh oh!

timmoon10 commented Oct 1, 2024

Uh oh!

timmoon10 commented Oct 10, 2024

Uh oh!

timmoon10 commented Oct 18, 2024

Uh oh!

timmoon10 commented Nov 5, 2024

Uh oh!

timmoon10 commented Nov 5, 2024

Uh oh!

Uh oh!

binxuan commented Dec 19, 2024

Uh oh!

timmoon10 commented Feb 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

[PyTorch] Normalization ops #1033

[PyTorch] Normalization ops #1033

Uh oh!

Conversation

timmoon10 commented Jul 22, 2024

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Jul 22, 2024

Uh oh!

timmoon10 commented Jul 22, 2024

Uh oh!

timmoon10 commented Jul 30, 2024

Uh oh!

timmoon10 commented Jul 30, 2024

Uh oh!

timmoon10 commented Aug 12, 2024

Uh oh!

timmoon10 commented Sep 19, 2024

Uh oh!

timmoon10 commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Oct 1, 2024

Uh oh!

timmoon10 commented Oct 10, 2024

Uh oh!

timmoon10 commented Oct 18, 2024

Uh oh!

timmoon10 commented Nov 5, 2024

Uh oh!

timmoon10 commented Nov 5, 2024

Uh oh!

Uh oh!

binxuan commented Dec 19, 2024

Uh oh!

timmoon10 commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Sep 24, 2024 •

edited

Loading

timmoon10 commented Feb 28, 2025 •

edited

Loading