Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdamD implementation (or option to skip bias-correction to adam-derived optimizers)? #385

Open
jstjohn opened this issue Oct 25, 2021 · 1 comment

Comments

@jstjohn
Copy link

jstjohn commented Oct 25, 2021

I recently put out a proposal to add an argument to adam-derived optimizers to skip the bias-correction term on w, only applying it to v. See the figure attached in the issue pytorch/pytorch#67105 and the write-up I put together for theoretical justification AdamD: Improved bias-correction in Adam. Since it's still too early in the idea's existence to add this to the pytorch repo (according to them), your repo seems like a reasonable home for it. I am happy to send you a PR, but I would like to hear what you would prefer:

  1. A new optimizer, AdamD and AdamDW (mirroring Adam/AdamW but with the bias-correction on the w term step excluded).
  2. An otherwise vanilla fork of Adam/AdamW, with a boolean flag allowing the user to turn the bias-correction on/off, as well as adding this option to the relevant optimizers already included in this repo. I have not read through it carefully but this would likely include Lamb (it would be an option to enable bias-correction on v only, since it is already excluded otherwise), AdamP, and maybe others.

Let me know how you would like to proceed, or if you want any further clarification!

@jettify
Copy link
Owner

jettify commented Oct 26, 2021

I will be happy to accept PR, I like option 1 seems like more clear API. Internally if possible implementation should share code if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants