Releases: ClashLuke/HeavyBall
Releases · ClashLuke/HeavyBall
OrthoGrad & PSGD improvements
- General
precond_schedule
matches its docs (@francois-rozet, #31)- unified warmup_steps API (@francois-rozet, #32 )
- add
eps
arg toscale_by_adam
(#33) - allow external management of LR (for
foreach=True
optimizers)
- OrthoGrad, a "grokking-first" optimizer that works
- PSGD
- no more OOM in
torch.linalg.solve
- speed up cache by skipping it when it wouldn't give speedups
- add newton-PSGD ("hvp-PSGD") using finite-difference approximation
- caution momentum, not update (-> improved convergence; closer to paper's intention)
- no more OOM in
- Benchmarks
grokking
benchmark, using modular addition and wide MLPs
Fix PSGD, spring cleaning
- Previously, only the first parameter of PSGD was trained; This is fixed now
- All PSGDs were
PurePSGD
- nowmomentum_into_precond_update
andexp_avg_input
have their expected effect again - preliminary support for external changes of
group['lr']
v1.3.0
faster, less memory, minor fixes
- LaProp/Adam/... are now compilable
fused_hook
andhook_optimizer_into_model
, reducing memory usage by fusing backward pass with optimizer step- fewer inplace ops, giving better compilations and cleaner code
- scaling ("graft", "scale", "none") for Muon, allowing Adam#Muon at minimal cost
storage_dtype
argument is implemented again- LaProp is correctly implemented, ADOPT is more stable
- via @ethansmith2000: cleaner, more maintainable
defaults
, reducing the surface for potential errors
Stability, Muon and Fixes
utils
- bugfixes impacting SFAdamW and RMSProp
- breaking:
zeroth_power_method
no longer supportseigh
and doesn't allow specification of the number of newtonschulz iterations - faster newtonschulz5 (via @tysam-code)
- PSGD preconditioner dampening (via @evanatyourservice)
chainable
- implementation of
nesterov_momentum
,heavyball_momentum
andorthogonalize_update
- implementation of
- core
- heavyball.Muon (by chaining
nesterov_momentum
andorthogonalize_update
); Muon supports gradient and update clipping out of the box
- heavyball.Muon (by chaining