[bugfix]Fix bug in Lora checkpoint saving step #10252

leisuzz · 2024-12-17T02:32:20Z

What does this PR do?

Fix bug in checkpoint saving steps:

Free variable referenced before assignment in the original code if it is only elif isinstance(model, type(unwrap_model(text_encoder)))
IndexError: pop from empty list

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

leisuzz · 2024-12-17T02:33:25Z

@sayakpaul Please take a look at this PR, I'm having trouble in dreambooth lora checkpoint saving. After these implementation, the issue has been solved

sayakpaul · 2024-12-17T03:09:38Z

Just so I understand, this is applicable only when we're using DeepSpeed?

leisuzz · 2024-12-17T03:11:05Z

@sayakpaul I'm using deepspeed to train the model, I'm not sure whether it works or not without deepspeed

sayakpaul · 2024-12-17T03:23:23Z

Same comments from your other PR that you closed apply here. We're missing:

Incorporating LoRA/model loading support when using DeepSpeed.
Other necessary checks when using DeepSpeed.

As mentioned in the other PR of yours, if you go here and search for DistributedType.DEEPSPEED, the outline of the changes should become evident and they will help you understand what we're missing in this PR. Without those changes, it'd be hard to merge this PR.

leisuzz · 2024-12-17T03:30:18Z

@sayakpaul I don' have any issue related with loading with deepspeed, the issue only occurs in saving steps

sayakpaul · 2024-12-17T03:46:28Z

We have extensively tested the changes I mentioned in my comment. Hence, I kindly ask you to propagate them.

leisuzz · 2024-12-17T06:12:22Z

Thanks, let me think how to modify them

leisuzz · 2024-12-19T02:01:57Z

@sayakpaul I've updated the save mode for the correct version, please take a look. This should work properly for loading function as well

sayakpaul

Let's keep the changes limited to the Flux, script perhaps?

examples/dreambooth/train_dreambooth_lora_flux.py

leisuzz · 2024-12-19T08:57:51Z

@sayakpaul The issue happened both in sd3 and flux actually, I've also fixed the issue with distributed in condition with checkpoints_total_limit is not none

leisuzz · 2024-12-23T04:41:13Z

@sayakpaul I've fixed both loading and saving step, please take a look

sayakpaul · 2024-12-23T08:23:34Z

We'd prefer to tackle this one script at a time. So, please keep the changes to a single script, only.

leisuzz · 2024-12-23T09:38:39Z

@sayakpaul I've edited and now the changes are only based on train_dreambooth_lora_sd3.py (sd3 and flux are similar and I tested both of them), please take a look. Let me know if that's ok for me to edit on flux as well.

examples/dreambooth/train_dreambooth_lora_sd3.py

sayakpaul

Just a few more changes.

examples/dreambooth/train_dreambooth_lora_sd3.py

leisuzz · 2024-12-23T11:07:36Z

@sayakpaul Please take a look at this newest version, thanks

sayakpaul · 2024-12-24T15:34:01Z

@leisuzz where you not using a non LoRA version of the script? I think I got confused because of the repeated reviews.

Maybe we could close this PR and I could instead open a PR with you as a co-author?

leisuzz · 2024-12-25T01:22:20Z

@sayakpaul Sorry about this, I will close this PR then

sayakpaul · 2024-12-25T02:22:01Z

@leisuzz thanks for your understanding. Could you please provide your GitHub commit email address so that I can add you as a co-author?

leisuzz · 2024-12-25T03:44:19Z

@sayakpaul My email address is: [email protected]. Thanks for your help!

[bugfix]Fix bug in Lora checkpoint saving step

6ce723e

J石页 added 2 commits December 18, 2024 15:00

[bugfix]Fix bug in Lora checkpoint saving step

aaf66df

[bugfix]Fix bug in Lora checkpoint saving step

5f56eb4

sayakpaul reviewed Dec 19, 2024

View reviewed changes

examples/dreambooth/train_dreambooth_lora_flux.py Outdated Show resolved Hide resolved

J石页 added 2 commits December 19, 2024 16:43

[bugfix]Fix bug in Lora checkpoint saving step

4ede036

[bugfix]Fix bug in Lora checkpoint saving step

9c1c2e7

[bugfix]Fix bug in Lora checkpoint saving step

0a47349

Merge branch 'main' into main

a81189d

J石页 added 2 commits December 23, 2024 17:27

[bugfix]Fix bug in Lora checkpoint saving step

85b116e

Merge branch 'main' of https://github.com/leisuzz/diffusers

3f3bbcb

sayakpaul requested changes Dec 23, 2024

View reviewed changes

examples/dreambooth/train_dreambooth_lora_sd3.py Outdated Show resolved Hide resolved

examples/dreambooth/train_dreambooth_lora_sd3.py Outdated Show resolved Hide resolved

J石页 and others added 2 commits December 23, 2024 18:21

[bugfix]Fix bug in Lora checkpoint saving step

4807390

Merge branch 'main' into main

ac5b251

sayakpaul reviewed Dec 23, 2024

View reviewed changes

examples/dreambooth/train_dreambooth_lora_sd3.py Outdated Show resolved Hide resolved

examples/dreambooth/train_dreambooth_lora_sd3.py Outdated Show resolved Hide resolved

J石页 added 3 commits December 23, 2024 18:58

[bugfix]Fix bug in Lora checkpoint saving step

1730a82

Merge branch 'main' of https://github.com/leisuzz/diffusers

0024652

[bugfix]Fix bug in Lora checkpoint saving step

5103c73

leisuzz requested a review from sayakpaul December 24, 2024 07:33

leisuzz closed this Dec 25, 2024

sayakpaul mentioned this pull request Dec 25, 2024

[training] add ds support to lora sd3. #10378

Merged

[bugfix]Fix bug in Lora checkpoint saving step #10252

[bugfix]Fix bug in Lora checkpoint saving step #10252

Uh oh!

Conversation

leisuzz commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

leisuzz commented Dec 17, 2024

Uh oh!

sayakpaul commented Dec 17, 2024

Uh oh!

leisuzz commented Dec 17, 2024

Uh oh!

sayakpaul commented Dec 17, 2024

Uh oh!

leisuzz commented Dec 17, 2024

Uh oh!

sayakpaul commented Dec 17, 2024

Uh oh!

leisuzz commented Dec 17, 2024

Uh oh!

leisuzz commented Dec 19, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leisuzz commented Dec 19, 2024

Uh oh!

leisuzz commented Dec 23, 2024

Uh oh!

sayakpaul commented Dec 23, 2024

Uh oh!

leisuzz commented Dec 23, 2024

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leisuzz commented Dec 23, 2024

Uh oh!

sayakpaul commented Dec 24, 2024

Uh oh!

leisuzz commented Dec 25, 2024

Uh oh!

sayakpaul commented Dec 25, 2024

Uh oh!

leisuzz commented Dec 25, 2024

Uh oh!

Uh oh!

leisuzz commented Dec 17, 2024 •

edited

Loading