-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix]Fix bug in Lora checkpoint saving step #10252
Conversation
@sayakpaul Please take a look at this PR, I'm having trouble in dreambooth lora checkpoint saving. After these implementation, the issue has been solved |
Just so I understand, this is applicable only when we're using DeepSpeed? |
@sayakpaul I'm using deepspeed to train the model, I'm not sure whether it works or not without deepspeed |
Same comments from your other PR that you closed apply here. We're missing:
As mentioned in the other PR of yours, if you go here and search for |
@sayakpaul I don' have any issue related with loading with deepspeed, the issue only occurs in saving steps |
We have extensively tested the changes I mentioned in my comment. Hence, I kindly ask you to propagate them. |
Thanks, let me think how to modify them |
@sayakpaul I've updated the save mode for the correct version, please take a look. This should work properly for loading function as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the changes limited to the Flux, script perhaps?
@sayakpaul The issue happened both in sd3 and flux actually, I've also fixed the issue with distributed in condition with checkpoints_total_limit is not none |
@sayakpaul I've fixed both loading and saving step, please take a look |
We'd prefer to tackle this one script at a time. So, please keep the changes to a single script, only. |
@sayakpaul I've edited and now the changes are only based on train_dreambooth_lora_sd3.py (sd3 and flux are similar and I tested both of them), please take a look. Let me know if that's ok for me to edit on flux as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more changes.
@sayakpaul Please take a look at this newest version, thanks |
@leisuzz where you not using a non LoRA version of the script? I think I got confused because of the repeated reviews. Maybe we could close this PR and I could instead open a PR with you as a co-author? |
@sayakpaul Sorry about this, I will close this PR then |
@sayakpaul My email address is: [email protected]. Thanks for your help! |
What does this PR do?
Fix bug in checkpoint saving steps:
elif isinstance(model, type(unwrap_model(text_encoder)))
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.