Skip to content

OrthoGrad & PSGD improvements

Latest
Compare
Choose a tag to compare
@ClashLuke ClashLuke released this 18 Jan 07:57
512ffd0
  • General
    • precond_schedule matches its docs (@francois-rozet, #31)
    • unified warmup_steps API (@francois-rozet, #32 )
    • add eps arg to scale_by_adam (#33)
    • allow external management of LR (for foreach=True optimizers)
  • OrthoGrad, a "grokking-first" optimizer that works
  • PSGD
    • no more OOM in torch.linalg.solve
    • speed up cache by skipping it when it wouldn't give speedups
    • add newton-PSGD ("hvp-PSGD") using finite-difference approximation
    • caution momentum, not update (-> improved convergence; closer to paper's intention)
  • Benchmarks
    • grokking benchmark, using modular addition and wide MLPs