Releases · ClashLuke/HeavyBall · GitHub

18 Jan 07:57

ClashLuke

OrthoGrad & PSGD improvements Latest

Latest

General
- precond_schedule matches its docs (@francois-rozet, #31)
- unified warmup_steps API (@francois-rozet, #32 )
- add eps arg to scale_by_adam (#33)
- allow external management of LR (for foreach=True optimizers)
OrthoGrad, a "grokking-first" optimizer that works
PSGD
- no more OOM in torch.linalg.solve
- speed up cache by skipping it when it wouldn't give speedups
- add newton-PSGD ("hvp-PSGD") using finite-difference approximation
- caution momentum, not update (-> improved convergence; closer to paper's intention)
Benchmarks
- grokking benchmark, using modular addition and wide MLPs

Contributors

francois-rozet

Assets 2

01 Jan 15:50

ClashLuke

Fix PSGD, spring cleaning

Previously, only the first parameter of PSGD was trained; This is fixed now
All PSGDs were PurePSGD - now momentum_into_precond_update and exp_avg_input have their expected effect again
preliminary support for external changes of group['lr']

Assets 2

18 Dec 17:54

ClashLuke

v1.3.0

fixes: in 1.2.x (not 1.1.x), all optimizers were SGD; AdamW now runs AdamW again
heavyball.utils.disable_caution_scaling implements the behavior documented here
SOAP converges well again

Assets 2

15 Dec 19:01

ClashLuke

faster, less memory, minor fixes

LaProp/Adam/... are now compilable
fused_hook and hook_optimizer_into_model, reducing memory usage by fusing backward pass with optimizer step
fewer inplace ops, giving better compilations and cleaner code
scaling ("graft", "scale", "none") for Muon, allowing Adam#Muon at minimal cost
storage_dtype argument is implemented again
LaProp is correctly implemented, ADOPT is more stable
via @ethansmith2000: cleaner, more maintainable defaults, reducing the surface for potential errors

Contributors

ethansmith2000

Assets 2

08 Dec 22:54

ClashLuke

Stability, Muon and Fixes

utils
- bugfixes impacting SFAdamW and RMSProp
- breaking: zeroth_power_method no longer supports eigh and doesn't allow specification of the number of newtonschulz iterations
- faster newtonschulz5 (via @tysam-code)
- PSGD preconditioner dampening (via @evanatyourservice)
chainable
- implementation of nesterov_momentum, heavyball_momentum and orthogonalize_update
core
- heavyball.Muon (by chaining nesterov_momentum and orthogonalize_update); Muon supports gradient and update clipping out of the box

Contributors

evanatyourservice and tysam-code

Assets 2

07 Dec 19:36

ClashLuke

v1.0.0

functional (optax-style) API and backend

Assets 2