[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` #1128

yaox12 · 2024-08-22T09:18:08Z

Description

This PR

Propagates the fp8 scale-inverse modification to GroupedLinear.
Fixes a bug that wrong scale_inv is used for weights in fp8_grouped_gemm.
Adds a new grouped gemm interface for separate scale_inv (for weights) and single output (for fwd output and bwd dgrad).

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

yaox12 · 2024-08-22T09:43:21Z

transformer_engine/pytorch/module/grouped_linear.py

-                fp8_meta["scaling_fwd"].scale_inv,
-                _GEMM_WEIGHT,


~~Hi @timmoon10, I want to confirm that the previous way to get scale_inv from fp8_meta["scaling_fwd"] in forward still works but not encouraged, right?~~

I'm pretty sure this was a bug and maybe I should create an issue to let users be aware of it.

What is the bug you are noticing? You are changing the API for fp8_grouped_gemm, but other than that it looks like using fp8_meta would be correct (but not recommended).

When casting weights to FP8 in get_fp8_workspace(), the scale_invs of weight tensors are written to the private ones in Float8Tensors, while the scale_invs in fp8_meta are updated after the first forward step is done. So for the first micro batch, they don't match.

Signed-off-by: Xin Yao <[email protected]>

transformer_engine/pytorch/cpp_extensions/gemm.py

Signed-off-by: Xin Yao <[email protected]>

yaox12 · 2024-08-26T07:09:01Z

@timmoon10 Can I have your review?

ksivaman · 2024-08-26T14:15:15Z

Can we not add this functionality to the existing grouped_gemm API via arguments instead of introducing another one? @yaox12

Signed-off-by: Xin Yao <[email protected]>

yaox12 · 2024-08-27T06:34:21Z

Can we not add this functionality to the existing grouped_gemm API via arguments instead of introducing another one? @yaox12

Done.

Signed-off-by: Xin Yao <[email protected]>

yaox12 · 2024-09-02T04:29:21Z

@ksivaman Can you trigger the CI?

timmoon10

Overall looks reasonable to me. I just have a suggestion to make the internal API more consistent.

transformer_engine/pytorch/cpp_extensions/gemm.py

timmoon10 · 2024-09-03T23:40:50Z

transformer_engine/pytorch/module/grouped_linear.py

-                fp8_meta["scaling_fwd"].scale_inv,
-                _GEMM_WEIGHT,


What is the bug you are noticing? You are changing the API for fp8_grouped_gemm, but other than that it looks like using fp8_meta would be correct (but not recommended).

timmoon10 · 2024-09-03T23:42:51Z

/te-ci pytorch

Signed-off-by: Xin Yao <[email protected]>

yaox12 · 2024-09-05T05:42:10Z

/te-ci pytorch

yaox12 · 2024-09-06T01:22:04Z

/te-ci pytorch

yaox12 · 2024-09-09T09:22:22Z

@timmoon10 @ksivaman Can you take another look at this PR? Thanks.

ksivaman

LGTM

yaox12 marked this pull request as ready for review August 22, 2024 09:18

yaox12 commented Aug 22, 2024

View reviewed changes

propagate scale_inv modification to GroupedLinear

c6aff8b

Signed-off-by: Xin Yao <[email protected]>

yaox12 force-pushed the xiny/propagate_scale_inv branch from f7ed83f to 117aa98 Compare August 26, 2024 06:50

yaox12 commented Aug 26, 2024

View reviewed changes

transformer_engine/pytorch/cpp_extensions/gemm.py Outdated Show resolved Hide resolved

optimization for separate scale_inv of weights and single output

8eba144

Signed-off-by: Xin Yao <[email protected]>

yaox12 force-pushed the xiny/propagate_scale_inv branch from 4cd0fca to 8eba144 Compare August 26, 2024 07:01

ksivaman self-requested a review August 26, 2024 14:15

let grouped gemm support different input combinations

7b9ac33

Signed-off-by: Xin Yao <[email protected]>

yaox12 force-pushed the xiny/propagate_scale_inv branch from ac034a6 to 12fad7d Compare August 27, 2024 07:38

fix type

e42a6d4

Signed-off-by: Xin Yao <[email protected]>

yaox12 force-pushed the xiny/propagate_scale_inv branch from 12fad7d to e42a6d4 Compare August 27, 2024 08:11

yaox12 added 4 commits August 28, 2024 10:54

Merge branch 'main' into xiny/propagate_scale_inv

d9ec9c3

Merge branch 'main' into xiny/propagate_scale_inv

ef0145a

add contiguous check

d7441f0

Signed-off-by: Xin Yao <[email protected]>

Merge branch 'main' into xiny/propagate_scale_inv

bb14f52

Merge branch 'main' into xiny/propagate_scale_inv

0e66aee

timmoon10 self-requested a review September 3, 2024 18:59

timmoon10 approved these changes Sep 3, 2024

View reviewed changes

yaox12 added 3 commits September 4, 2024 19:38

Merge branch 'main' into xiny/propagate_scale_inv

acd1120

use len() instead of isinstance

e8b4c01

Signed-off-by: Xin Yao <[email protected]>

fix ut

09befbe

Signed-off-by: Xin Yao <[email protected]>

Merge branch 'main' into xiny/propagate_scale_inv

31b6ad6

ksivaman approved these changes Sep 9, 2024

View reviewed changes

ksivaman merged commit 047a507 into NVIDIA:main Sep 9, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` #1128

[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` #1128

yaox12 commented Aug 22, 2024 •

edited

Loading

yaox12 Aug 22, 2024 •

edited

Loading

timmoon10 Sep 3, 2024

yaox12 Sep 4, 2024

yaox12 commented Aug 26, 2024

ksivaman commented Aug 26, 2024

yaox12 commented Aug 27, 2024

yaox12 commented Sep 2, 2024

timmoon10 left a comment

timmoon10 Sep 3, 2024

timmoon10 commented Sep 3, 2024

yaox12 commented Sep 5, 2024

yaox12 commented Sep 6, 2024

yaox12 commented Sep 9, 2024

ksivaman left a comment

[PyTorch] Propagate fp8 scale-inverse modification to GroupedLinear #1128

[PyTorch] Propagate fp8 scale-inverse modification to GroupedLinear #1128

Conversation

yaox12 commented Aug 22, 2024 • edited Loading

Description

Type of change

Checklist:

yaox12 Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

timmoon10 Sep 3, 2024

Choose a reason for hiding this comment

yaox12 Sep 4, 2024

Choose a reason for hiding this comment

yaox12 commented Aug 26, 2024

ksivaman commented Aug 26, 2024

yaox12 commented Aug 27, 2024

yaox12 commented Sep 2, 2024

timmoon10 left a comment

Choose a reason for hiding this comment

timmoon10 Sep 3, 2024

Choose a reason for hiding this comment

timmoon10 commented Sep 3, 2024

yaox12 commented Sep 5, 2024

yaox12 commented Sep 6, 2024

yaox12 commented Sep 9, 2024

ksivaman left a comment

Choose a reason for hiding this comment

[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` #1128

[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` #1128

yaox12 commented Aug 22, 2024 •

edited

Loading

yaox12 Aug 22, 2024 •

edited

Loading