Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update learning rate schedule in training/default.yaml #58

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jswijnands
Copy link

This updates the default values for the learning rate schedule. Specifically, the local learning rate is increased to 8x the original learning rate. In addition, the number of iterations is reduced from 300k to 150k. This has led to similar accuracy in stretched grid experiments using the graph transformer architecture, while requiring only 50% of the GPU compute. Optimised default values could save GPU resources across the Pilot Project.

Suggested review questions:

  • Are these settings also an enhancement for global / LAM models? (tested only for stretched grid)
  • Are these settings also an enhancement for GNN / transformer models? (tested only for graph transformer architecture)

This updates the default values for the learning rate schedule. Specifically, the local learning rate is increased to 8x the original learning rate. In addition, the number of iterations is reduced from 300k to 150k. This has led to similar accuracy in stretched grid experiments using the graph transformer architecture, while requiring only 50% of the GPU compute.
@FussyDuck
Copy link

FussyDuck commented Sep 17, 2024

CLA assistant check
All committers have signed the CLA.

@JesperDramsch
Copy link
Member

Hi @jswijnands, thank you for the contribution.

We'll look into benchmarking this on the non-stretched grid models and evaluate if we should maybe make this a special config, as I believe the stretched-grid may generally benefit from different training defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants