Bind max steps and lr iterations #67

Rilwan-Adewoyin · 2024-09-30T17:44:13Z

Context & Current Setup:

User defines training.lr.iterations which effects how many steps the model learns for
However, this often does not align with the training.max_epochs value which is often larger
GPU hours potentially wasted as max_epochs =/= training.lr.iterations
Alternatively unexpected results from training length not aligning to iteration schedule behaviour

Changes Aim to tie training.lr.iterations to the training length

Change default method to control run length to be by training.max_steps instead of training.max_epochs
Changed default setting in config such that training.lr.iterations defaults to training.max_steps
User now defines training.max_steps in order to define length of training
User still has the optionality

…s by default

FussyDuck · 2024-09-30T17:44:18Z

All committers have signed the CLA.

CHANGELOG.md

src/anemoi/training/config/training/default.yaml

for more information, see https://pre-commit.ci

mchantry

Looks good to me. I will run it by the ATS. Don't merge yet.

anaprietonem · 2024-10-09T13:22:36Z

src/anemoi/training/config/training/default.yaml

+# Set max_epochs or max_steps. Training stops at the first limit reached.
+max_epochs: null
+max_steps: 150000
+
 lr:


I think having this functionality it's great, so thanks @Rilwan-Adewoyin for implementing it!
just a quick question, what happens when/if they user pass both max_steps and max_epochs? will the code then run until max_epochs is reached and the scheduler used the max_steps?
My two cents here is that probably it could be nice to add some logger info to indicate this!

Hey thanks,
So the default pytorch lightning behaviour is that the code will run until the smallest of max_steps or max_epochs is reached, but the scheduler will be aligned to max_steps.

Yep that sounds like a good idea, so add a logger.info message when the user sets both

Rilwan-Adewoyin added 2 commits September 30, 2024 17:34

Added max_steps to config and tied training.lr.iterations to max_step…

fe3533f

…s by default

annotate config with information on how to use max_epochs setting

07332cd

Rilwan-Adewoyin requested review from mchantry and theissenhelen September 30, 2024 17:44

Rilwan-Adewoyin assigned mchantry and Rilwan-Adewoyin Sep 30, 2024

update CHANGELOG.md

2a9398b

mchantry reviewed Sep 30, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

mchantry reviewed Sep 30, 2024

View reviewed changes

src/anemoi/training/config/training/default.yaml Outdated Show resolved Hide resolved

Rilwan-Adewoyin and others added 4 commits October 1, 2024 08:06

#67 fixed entry in changelog.md

ef8dbb7

created uniform configurability in config max_steps / max_epochs

166fc13

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e17b20

for more information, see https://pre-commit.ci

Merge branch 'develop' into feat/bind_maxsteps_lriters

1c9093c

mchantry previously approved these changes Oct 9, 2024

View reviewed changes

anaprietonem reviewed Oct 9, 2024

View reviewed changes

Added note regarding behaviour of scheduler to config

1d27b5d

Rilwan-Adewoyin dismissed mchantry’s stale review via 1d27b5d October 10, 2024 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind max steps and lr iterations #67

Bind max steps and lr iterations #67

Rilwan-Adewoyin commented Sep 30, 2024 •

edited

Loading

FussyDuck commented Sep 30, 2024 •

edited

Loading

mchantry left a comment

anaprietonem Oct 9, 2024

Rilwan-Adewoyin Oct 9, 2024

Bind max steps and lr iterations #67

Are you sure you want to change the base?

Bind max steps and lr iterations #67

Conversation

Rilwan-Adewoyin commented Sep 30, 2024 • edited Loading

FussyDuck commented Sep 30, 2024 • edited Loading

mchantry left a comment

Choose a reason for hiding this comment

anaprietonem Oct 9, 2024

Choose a reason for hiding this comment

Rilwan-Adewoyin Oct 9, 2024

Choose a reason for hiding this comment

Rilwan-Adewoyin commented Sep 30, 2024 •

edited

Loading

FussyDuck commented Sep 30, 2024 •

edited

Loading