-
Notifications
You must be signed in to change notification settings - Fork 48
Roadmap
Joel Oskarsson edited this page Nov 18, 2024
·
19 revisions
Aim: Training based on general zarr-datasets with ability to experiment with the graph architecture used.
- What: Replace npy-file based training datasets of v0.1.0 neural-lam and
dataset-specific things stored in
neural_lam/constants.py
with zarr-based datasets that are self-contained. Should we completely replace the existingWeatherDataset
? I would prefer making a class for sayNumpyWeatherDataset
.- Why: Need to be able to support more than just MEPS dataset so code needs to be made more flexible on what can be read into neural-lam.
- Who: sadamov and leifdenby?
- Issues:
- Relevant code:
- What: Make it easier to experiment with graph construction that works on
multiple input datasets.
-
Why: The current graph construction code in neural-lam has two types of
graph architecture that can be created ("hierarchical" and GraphCast-LAM), but
to experiment with different approaches for creating the g2m, m2m and m2g
connections this code needs modularising. This is what
weather-model-graphs
sets out to achieve (by refactoring the current graph-generation code in neural-lam). - Who: leifdenby, joeloskarsson
- Issues:
- Relevant code: https://github.com/mllam/weather-model-graphs
- external tool for creating, visualising and writing graphs for data-driven weather models
-
Why: The current graph construction code in neural-lam has two types of
graph architecture that can be created ("hierarchical" and GraphCast-LAM), but
to experiment with different approaches for creating the g2m, m2m and m2g
connections this code needs modularising. This is what
-
What: Set up a more robust and structured documentation solution
- Why: As Neural-LAM grows and we make changes to it there is an increasing need for documentation. Already now the README is quite long, and some things are not that easy to find in it.
- Who: @joeloskarsson, help wanted
- Reviewers: joeloskarsson, sadamov
- Issue: https://github.com/mllam/neural-lam/issues/61
- Relevant code:
-
What: "Hello-world" example of complete data pre-processing, model training and evaluation
- Why: There are quite a few steps to get started with training a model. Once the documentation is in place it would be really useful to write one complete example covering data pre-processing, graph construction, model training and model evaluation.
- Who: sadamov
- Issues:
-
What: Hard-constrain outputs for variables not taking values in all of
$\mathbb{R}$ .-
Why: Variables like relative humidity take values in
$[0,1]$ and total precipitation can not be negative. The neural network can technically output anything right now. These constraints should be ensured. - Who: simonkamuk
- Reviewers: joeloskarsson, sadamov
- Issues:
-
Why: Variables like relative humidity take values in
-
What: Separate interior state and boundary forcing to only predict state
- Why: Establish a clear separation of the state in the interior region and the boundary forcing coming from outside the limited area. Do not waste computations on producing predictions for boundary that are never used.
- Who: joeloskarsson
- Issues:
-
What: Data loading from separate dataset for boundary
- Why: We want to have different options for where to get the boundary forcing from. Also this boundary forcing data should use the datastore format.
- Who: sadamov, leifdenby
-
Issues:
- TBA
- What: Make sure that Neural-LAM is ready for multi-node training.
- Why: The training of high-res 4D weather models takes a long time and requires a lot of virtual memory. Using DDP and Pytorch-Lightning, this functionality can easily be provided for various HPC systems. Model training and statistics calculations are the obvious tasks that benefit from running on many GPUs in parallel. The choice to use SLURM seemed obvious, but maybe the issues below should also cover other schedulers?
- Who: sadamov
- Issues:
- Related, but separate issues:
- What: Perform standardization of data on GPU rather than CPU
- Why: Performing the standardization requires entry-wise sums and multiplications of the exact type that GPUs are suitable for. Doing this is slow on CPU. It would be good to move these computations to GPU to reduce the overhead of the data loading.
- Who: sadamov
- Issues:
- What: Include latent variable model capable of ensemble forecasting. Best
way to do this is to re-structure model class hierarchy into deterministic and
probabilistic models.
- Why: Ensemble forecasting is valuable, need uncertainty in predictions.
- Who: joeloskarsson
- Issue: https://github.com/mllam/neural-lam/issues/62
- Relevant code: https://github.com/mllam/neural-lam/tree/prob_model_lam
- What: Include features that enable running global forecast models using
neural-LAM. Using mesh-graphs from GraphCast code in neural-LAM models.
Area-weighted loss functions and metrics.
- Why: Global forecasting can be viewed as just a special case of the neural-LAM setup (without any boundary forcing). To be able to experiment with models both in a LAM and global setting it would be nice to allow also for global forecasting using the same codebase.
- Who: joeloskarsson
- Issue: https://github.com/mllam/neural-lam/issues/63
- Relevant code: https://github.com/mllam/neural-lam/tree/prob_model_global
- What: Make it easier to generate a NeuralLAM forecast for a specific
datetime in the test dataset and plot it + animate it.
- Why: Often in atmospheric research we are interested in a specific event over a specific region. Using the newly introduced batch_times and the feature requests listed below, it would become straightforward to plot such forecasts efficiently.
- Who: ?
- Relevant code:
- What: Refactor the model class hierarchy to be more understandable and easier to extend with new models.
-
Why: Currently the model class hierarchy is confusing and not very modular. Most things are handled by
ARModel
, rather than responsibilities being split up into logical components. - Who: joeloskarsson
- Issue: https://github.com/mllam/neural-lam/issues/49
-
Why: Currently the model class hierarchy is confusing and not very modular. Most things are handled by