[training] add ds support to lora sd3. #10378

sayakpaul · 2024-12-25T10:14:51Z

What does this PR do?

Cc: @leisuzz could you give this a try?

Co-authored-by: leisuzz <[email protected]>

HuggingFaceDocBuilderDev · 2024-12-25T10:21:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

leisuzz · 2024-12-30T06:44:09Z

@sayakpaul Please take a look at the comment, especially the third comment has to be changed!

sayakpaul · 2024-12-30T07:48:57Z

Please take a look at the comment, especially the third comment has to be changed!

No idea what this means.

leisuzz · 2024-12-30T09:09:27Z

@sayakpaul In the line 1852, I added the modification on if args.checkpoints_total_limit is not None:
It should be if accelerator.is_main_process and args.checkpoints_total_limit is not None:. Otherwise errors will occur

sayakpaul · 2024-12-30T09:13:04Z

Can you comment on the changes directly instead? That will be helpful and easier.

leisuzz · 2024-12-27T01:00:36Z

examples/dreambooth/train_dreambooth_lora_sd3.py

            for model in models:
-                if isinstance(model, type(unwrap_model(transformer))):
+                if isinstance(unwrap_model(model), type(unwrap_model(transformer))):
+                    model = unwrap_model(model)


We can modify this to:

transformer_model = unwrap_model(model) if args.upcast_before_saving: transformer_model = transformer_model.to(torch.float32) else: transformer_model = transformer_model.to(weight_dtype) transformer_lora_layers_to_save = get_peft_model_state_dict(transformer_model)

As

else should not be needed as the model would already be type-casted.

why is the change of isinstance(model,... to isinstance(unwrap_model(model),... is needed in the if statement?

That is a deepspeed-specific change as with deepspeed, the model gets wrapped into a Module.

leisuzz · 2024-12-27T01:03:14Z

examples/dreambooth/train_dreambooth_lora_sd3.py

+                    model = unwrap_model(model)
                    transformer_lora_layers_to_save = get_peft_model_state_dict(model)
-                elif isinstance(model, type(unwrap_model(text_encoder_one))):  # or text_encoder_two
+                elif isinstance(unwrap_model(model), type(unwrap_model(text_encoder_one))):  # or text_encoder_two


I think we should add and args.train_text_encoder , because it train_text_encoder is false, it should be a None

isn't it already be reflected in the models sent to save_model_hook?

nvm addressed

leisuzz · 2024-12-27T01:16:25Z

examples/dreambooth/train_dreambooth_lora_sd3.py

+                if accelerator.is_main_process or accelerator.distributed_type == DistributedType.DEEPSPEED:
                    if global_step % args.checkpointing_steps == 0:
                        # _before_ saving state, check if this save would set us over the `checkpoints_total_limit`
                        if args.checkpoints_total_limit is not None:


I think it should be if args.checkpoints_total_limit is not None and accelerator.is_main_process:

This is a must! It has to be changed!!! And I think the correct format should be if accelerator.is_main_process and args.checkpoints_total_limit is not None:

Calm down sir :)

The line is already under:

if accelerator.is_main_process or accelerator.distributed_type == DistributedType.DEEPSPEED:

So, this should already take care of what you're suggesting.

leisuzz · 2024-12-30T09:38:50Z

@sayakpaul Sorry about that, didn't notice the comment status is pending, just submitted.

linoytsaban

thanks a lot @leisuzz! LGTM, just a couple of small comments

linoytsaban · 2024-12-30T12:28:13Z

examples/dreambooth/train_dreambooth_lora_sd3.py

+                    model = unwrap_model(model)
                    transformer_lora_layers_to_save = get_peft_model_state_dict(model)
-                elif isinstance(model, type(unwrap_model(text_encoder_one))):  # or text_encoder_two
+                elif isinstance(unwrap_model(model), type(unwrap_model(text_encoder_one))):  # or text_encoder_two


isn't it already be reflected in the models sent to save_model_hook?

linoytsaban · 2024-12-30T12:33:28Z

examples/dreambooth/train_dreambooth_lora_sd3.py

            for model in models:
-                if isinstance(model, type(unwrap_model(transformer))):
+                if isinstance(unwrap_model(model), type(unwrap_model(transformer))):
+                    model = unwrap_model(model)


why is the change of isinstance(model,... to isinstance(unwrap_model(model),... is needed in the if statement?

* add ds support to lora sd3. Co-authored-by: leisuzz <[email protected]> * style. --------- Co-authored-by: leisuzz <[email protected]> Co-authored-by: Linoy Tsaban <[email protected]>

* Update pix2pix.md fix hyperlink error * fix md link typos * fix md typo - remove ".md" at the end of links * [Fix] Broken links in hunyuan docs (#10402) * fix-hunyuan-broken-links * [Fix] docs broken links hunyuan * [training] add ds support to lora sd3. (#10378) * add ds support to lora sd3. Co-authored-by: leisuzz <[email protected]> * style. --------- Co-authored-by: leisuzz <[email protected]> Co-authored-by: Linoy Tsaban <[email protected]> * fix md typo - remove ".md" at the end of links * fix md link typos * fix md typo - remove ".md" at the end of links --------- Co-authored-by: SahilCarterr <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: leisuzz <[email protected]> Co-authored-by: Linoy Tsaban <[email protected]>

add ds support to lora sd3.

ed6e374

Co-authored-by: leisuzz <[email protected]>

sayakpaul requested a review from linoytsaban December 25, 2024 10:14

Merge branch 'main' into ds-support-sd3-lora

5e760cd

leisuzz suggested changes Dec 30, 2024

View reviewed changes

linoytsaban and others added 2 commits December 30, 2024 12:05

Merge branch 'main' into ds-support-sd3-lora

f75f695

style.

4afe8de

linoytsaban reviewed Dec 30, 2024

View reviewed changes

linoytsaban approved these changes Dec 30, 2024

View reviewed changes

sayakpaul merged commit 5f72473 into main Dec 30, 2024
12 checks passed

sayakpaul deleted the ds-support-sd3-lora branch December 30, 2024 14:01

[training] add ds support to lora sd3. #10378

[training] add ds support to lora sd3. #10378

Uh oh!

Conversation

sayakpaul commented Dec 25, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 25, 2024

Uh oh!

leisuzz commented Dec 30, 2024

Uh oh!

sayakpaul commented Dec 30, 2024

Uh oh!

leisuzz commented Dec 30, 2024

Uh oh!

sayakpaul commented Dec 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leisuzz commented Dec 30, 2024

Uh oh!

linoytsaban left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!