Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix of wrong scale in weight decomposition #16151

Merged
merged 1 commit into from
Jul 6, 2024
Merged

Conversation

KohakuBlueleaf
Copy link
Collaborator

Description

Should resolve this: comfyanonymous/ComfyUI#3922

Checklist:

@ntc-ai
Copy link

ntc-ai commented Jul 5, 2024

Hi, this patch is wrong.

This variable is used by model authors to normalize the trained LoRA/DoRA from -1 to 1. With this patch the DoRA breaks when alpha changes instead of normalizing.

More: KohakuBlueleaf/LyCORIS#196

@KohakuBlueleaf
Copy link
Collaborator Author

KohakuBlueleaf commented Jul 5, 2024

it breaks bcuz it is not how alpha works.
I'm maintaining correct math formula.
Not maintaining what you want to do.
You can create an extension to achieve your goal.

@ntc-ai
Copy link

ntc-ai commented Jul 5, 2024

What is this useful for?
Are there references to the correct math formula?
There is no alpha in the DoRA whitepaper.

@KohakuBlueleaf
Copy link
Collaborator Author

KohakuBlueleaf commented Jul 5, 2024

What is this useful for? Are there references to the correct math formula? There is no alpha in the DoRA whitepaper.

wrong formula: W + scale * alpha/dim * (wd(W + BA) - W)
correct formula: W + scale * (wd(W + alpha/dim * BA) - W)

I think this is very intuitive that alpha/dim should not affect the "- W" part.

@KohakuBlueleaf
Copy link
Collaborator Author

Also that's how training of DoRA in my repo works.
I just reproduce the same formula of it.

@AUTOMATIC1111 AUTOMATIC1111 merged commit 372a8e0 into dev Jul 6, 2024
6 checks passed
@AUTOMATIC1111 AUTOMATIC1111 deleted the dora-fix branch July 6, 2024 06:48
@ntc-ai
Copy link

ntc-ai commented Jul 6, 2024

Hi, this PR should be reverted. Here are the reasons why:

  • With this PR, changing alpha breaks the model, even setting it to zero. This is different than how LoRAs work where alpha=0 turns off the LoRA.
  • There is no alpha in the whitepaper
  • This PR breaks existing trained DoRAs that have modified alpha based on the original code
  • alpha is not trainable and (alpha/dim) is 1.0 in most cases
  • I brought attention to an issue in the referenced implementation, not a1111. This code was correct before the changes.

@KohakuBlueleaf
Copy link
Collaborator Author

KohakuBlueleaf commented Jul 6, 2024

Hi, this PR should be reverted. Here are the reasons why:

  • With this PR, changing alpha breaks the model, even setting it to zero. This is different than how LoRAs work where alpha=0 turns off the LoRA.
  • There is no alpha in the whitepaper
  • This PR breaks existing trained DoRAs that have modified alpha based on the original code
  • alpha is not trainable and (alpha/dim) is 1.0 in most cases
  • I brought attention to an issue in the referenced implementation, not a1111. This code was correct before the changes.
  1. That's not how alpha works. DoRA never ensure it will not affect anything when BA part is all zero
  2. You should never modified alpha after training for all the models. If you want to do any special weight scaling. modify them with your custom code and equation.
  3. This code was NOT correct before the changes. Already give you the formula.

@ntc-ai
Copy link

ntc-ai commented Jul 7, 2024

  1. This code has been live for 4 months. The change you're implementing breaks functionality for anyone who has been using it during this time.

  2. You are making a breaking change without anyone requesting it - in fact I'm requesting the opposite!

  3. There's no external reference or evidence supporting your 'correct' formula. It seems to be based solely on your own intuition.

  4. Several important points and evidence towards the prior solution being correct remain unaddressed.

  5. In this formulation, alpha has no use. The best case I see is for gradient scaling but there are better ways to do that.

I hope you can understand the impact of these changes on users.

If you have the 'special weight scaling' code, feel free to share it. I don't think its possible in this formulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants