-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(training,rollout)!: Rollout Schedulers #46
base: main
Are you sure you want to change the base?
Conversation
- Allow for complex incrementing setup
- Calculation based not step based
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started to go through the PR and left some comments! I still need to understand better some of the functionality so hope te questions makes sense. Thanks for this Harrison!
- Add epoch record - Remove sync - Generalise base class
for more information, see https://pre-commit.ci
* Revert back to expanded workflow --------- Co-authored-by: Mario Santa Cruz <[email protected]>
e543ff1
to
88eacfb
Compare
b3c8815
to
944aa66
Compare
944aa66
to
49c09b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, please see comments
They are not, the indices are the same for each epoch. I don't believe it is as simple as setting |
persistent_workers=False means we have an initial seed and then just draw randon numbers from the generator, hence there should not be any repetition. What we could do: before we did not have the epoch information in the dataset. This should now be the case and you could simply add the epoch to the random seed:
something like: base_seed = get_base_seed() + self.epoch * 100 though we should probably rename base_seed here to dataset_seed otherwise it might be confusing -> we call refer to the base seed elsewhere in the code. |
56e11ab
to
92e0a0b
Compare
8629419
to
fe43a24
Compare
for more information, see https://pre-commit.ci
…cmwf/anemoi-core into 14-training-rollout-scheduling
- Fix issue with RandomIncrease
Closes #14
Rollout Schedulers
Expand the ways to describe rollout, and provide an interface to schedule updates
New default rollout config
Can step by epoch, step, and control the increment based on either the step or epoch.
Additonally, formally add random steppers.
Todo
Tested
Tested with restart, and with change in config after restart.
📚 Documentation preview 📚: https://anemoi-training--46.org.readthedocs.build/en/46/
📚 Documentation preview 📚: https://anemoi-graphs--46.org.readthedocs.build/en/46/
📚 Documentation preview 📚: https://anemoi-models--46.org.readthedocs.build/en/46/