Skip to content

Gradient Accumulation Fix

Compare
Choose a tag to compare
@danielhanchen danielhanchen released this 15 Oct 16:48
· 42 commits to main since this release
38663b0

We fixed a gradient accumulation bug which was actually discovered since 2021 here, and rediscovered here. Read more in our blog post: https://unsloth.ai/blog/gradient

We have a Colab Notebook for Llama 3.2 using the fixed trainer and a Kaggle Notebook as well.

Essentially theoretically bsz * ga should be equivalent to full batch training with no gradient accumulation, but weirdly the training losses do no match up:

We fixed it in Unsloth!

To use Unsloth's fixed trainer with gradient accumulation, use:

from unsloth import unsloth_train
# trainer_stats = trainer.train() << Buggy if using gradient accumulation
trainer_stats = unsloth_train(trainer) # << Fixed gradient accumulation

Please update Unsloth on local machines (no need for Colab / Kaggle) via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Read our blog post: https://unsloth.ai/blog/gradient for more details!

What's Changed

New Contributors

Full Changelog: September-2024...October-2024