Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam #1078

kunlunl · 2024-08-05T19:41:30Z

Description

Add options to set the dtypes of master weights, exp_avg and exp_avg_sq of FusedAdam.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

Support using fp32/fp16 master weights
Support using fp32/fp16/fp8 exp_avg
Support using fp32/fp16/fp8 exp_avg_sq

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: kunlunl <[email protected]>

kunlunl · 2024-10-30T09:51:38Z

@timmoon10 Hello, I noticed no one has commented on this MR for a long time, could you please take a look, or could you help find someone to review it?

timmoon10

Overall this looks good. It would be more general if we disentangled the state dtypes and state scaling (e.g. why not have scaled FP32 states or unscaled BF16 states?), but this does cover the specific cases in the MS-AMP paper.

For future reference, this PR adapts logic from NVIDIA/apex#1771. This is a proof-of-concept with several opporunities for future improvement:

TE kernel for computing absmax and scale
Fusing scale/unscale within Adam kernel
Reduce memory usage in optimizer step, perhaps by processing params in chunks
Reduce memory usage in checkpointing, perhaps by storing checkpoint buffers in CPU

transformer_engine/pytorch/optimizers/fused_adam.py

timmoon10 · 2024-10-31T00:44:14Z

transformer_engine/pytorch/optimizers/fused_adam.py

@@ -112,9 +149,6 @@ def __init__(
        self.set_grad_none = set_grad_none

        self.capturable = capturable
-
-        if master_weights is not None:
-            assert isinstance(master_weights, list), "master_weights must be a list if provided"
        self.master_weights = master_weights


This removes the use-case where the master weights are provided externally (added in #977). I personally like this change since it makes things cleaner, but will it have an effect on Mcore integration? Pinging @Wong4j.

Yes, I know this problem. I talked with @Wong4j offline and invited him to review this PR.
His MR in MCore (fuse dtype casting) has not been merged yet, so I put the "fusing dtype casting" function into a new MR in MCore, together with this precision-aware optimizer.

tests/pytorch/test_fused_optimizer.py

timmoon10 · 2024-10-31T01:24:32Z

/te-ci pytorch

yaox12 · 2024-10-31T09:13:30Z

/te-ci pytorch

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Kunlun Li <[email protected]>

timmoon10 · 2024-10-31T20:46:09Z

/te-ci pytorch

timmoon10

LGTM, pending CI and confirmation from @Wong4j that this won't break Mcore integration.

Wong4j · 2024-11-01T07:03:32Z

LGTM.
@timmoon10 This design is better. My mcore PR is not merged yet. So it won't break mcore.

kunlunl changed the title ~~Add MX-FP16~~ Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam Aug 6, 2024

kunlunl force-pushed the mx_fp16 branch from 27a1516 to c4c2126 Compare August 6, 2024 17:37

kunlunl changed the title ~~Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam~~ Draft: Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam Aug 6, 2024

kunlunl force-pushed the mx_fp16 branch from c4c2126 to 7de6806 Compare October 29, 2024 12:51

kunlunl changed the title ~~Draft: Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam~~ Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam Oct 29, 2024

Add precision aware fused adam

d05cc42

Signed-off-by: kunlunl <[email protected]>

kunlunl force-pushed the mx_fp16 branch from ba19169 to d05cc42 Compare October 30, 2024 09:43

timmoon10 reviewed Oct 31, 2024

View reviewed changes

timmoon10 self-requested a review October 31, 2024 01:24

kunlunl force-pushed the mx_fp16 branch from 84b3929 to 405d70e Compare October 31, 2024 09:12

kunlunl force-pushed the mx_fp16 branch from 08d3f8b to ad12551 Compare October 31, 2024 09:20

Minor changes based on review comments.

051e94b

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Kunlun Li <[email protected]>

kunlunl force-pushed the mx_fp16 branch from 709b341 to 051e94b Compare October 31, 2024 09:30

Merge branch 'main' into mx_fp16

41fe81b

timmoon10 approved these changes Oct 31, 2024

View reviewed changes

timmoon10 merged commit 05c0fb0 into NVIDIA:main Nov 1, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam #1078

Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam #1078

kunlunl commented Aug 5, 2024 •

edited

Loading

kunlunl commented Oct 30, 2024

timmoon10 left a comment •

edited

Loading

timmoon10 Oct 31, 2024

kunlunl Oct 31, 2024

timmoon10 commented Oct 31, 2024

yaox12 commented Oct 31, 2024

timmoon10 commented Oct 31, 2024

timmoon10 left a comment

Wong4j commented Nov 1, 2024

Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam #1078

Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam #1078

Conversation

kunlunl commented Aug 5, 2024 • edited Loading

Description

Type of change

Changes

Checklist:

kunlunl commented Oct 30, 2024

timmoon10 left a comment • edited Loading

Choose a reason for hiding this comment

timmoon10 Oct 31, 2024

Choose a reason for hiding this comment

kunlunl Oct 31, 2024

Choose a reason for hiding this comment

timmoon10 commented Oct 31, 2024

yaox12 commented Oct 31, 2024

timmoon10 commented Oct 31, 2024

timmoon10 left a comment

Choose a reason for hiding this comment

Wong4j commented Nov 1, 2024

kunlunl commented Aug 5, 2024 •

edited

Loading

timmoon10 left a comment •

edited

Loading