Skip to content

Releases: ClashLuke/HeavyBall

OrthoGrad & PSGD improvements

18 Jan 07:57
512ffd0
Compare
Choose a tag to compare
  • General
    • precond_schedule matches its docs (@francois-rozet, #31)
    • unified warmup_steps API (@francois-rozet, #32 )
    • add eps arg to scale_by_adam (#33)
    • allow external management of LR (for foreach=True optimizers)
  • OrthoGrad, a "grokking-first" optimizer that works
  • PSGD
    • no more OOM in torch.linalg.solve
    • speed up cache by skipping it when it wouldn't give speedups
    • add newton-PSGD ("hvp-PSGD") using finite-difference approximation
    • caution momentum, not update (-> improved convergence; closer to paper's intention)
  • Benchmarks
    • grokking benchmark, using modular addition and wide MLPs

Fix PSGD, spring cleaning

01 Jan 15:50
0519edb
Compare
Choose a tag to compare
  • Previously, only the first parameter of PSGD was trained; This is fixed now
  • All PSGDs were PurePSGD - now momentum_into_precond_update and exp_avg_input have their expected effect again
  • preliminary support for external changes of group['lr']

v1.3.0

18 Dec 17:54
9a20be2
Compare
Choose a tag to compare
  • fixes: in 1.2.x (not 1.1.x), all optimizers were SGD; AdamW now runs AdamW again
  • heavyball.utils.disable_caution_scaling implements the behavior documented here
  • SOAP converges well again
    image

faster, less memory, minor fixes

15 Dec 19:01
afd848f
Compare
Choose a tag to compare
  • LaProp/Adam/... are now compilable
  • fused_hook and hook_optimizer_into_model, reducing memory usage by fusing backward pass with optimizer step
  • fewer inplace ops, giving better compilations and cleaner code
  • scaling ("graft", "scale", "none") for Muon, allowing Adam#Muon at minimal cost
  • storage_dtype argument is implemented again
  • LaProp is correctly implemented, ADOPT is more stable
  • via @ethansmith2000: cleaner, more maintainable defaults, reducing the surface for potential errors

Stability, Muon and Fixes

08 Dec 22:54
Compare
Choose a tag to compare
  • utils
    • bugfixes impacting SFAdamW and RMSProp
    • breaking: zeroth_power_method no longer supports eigh and doesn't allow specification of the number of newtonschulz iterations
    • faster newtonschulz5 (via @tysam-code)
    • PSGD preconditioner dampening (via @evanatyourservice)
  • chainable
    • implementation of nesterov_momentum, heavyball_momentum and orthogonalize_update
  • core
    • heavyball.Muon (by chaining nesterov_momentum and orthogonalize_update); Muon supports gradient and update clipping out of the box

v1.0.0

07 Dec 19:36
Compare
Choose a tag to compare

functional (optax-style) API and backend