Skip to content

Roadmap

Joel Oskarsson edited this page Nov 18, 2024 · 19 revisions

Neural-lam Roadmap

v0.3.0

Aim: Training based on general zarr-datasets with ability to experiment with the graph architecture used.

Reading Training Data from Zarr-files

Generalise Graph-construction to Allow Creation of Novel Graphs on Arbitrary LAM Domains

  • What: Make it easier to experiment with graph construction that works on multiple input datasets.
    • Why: The current graph construction code in neural-lam has two types of graph architecture that can be created ("hierarchical" and GraphCast-LAM), but to experiment with different approaches for creating the g2m, m2m and m2g connections this code needs modularising. This is what weather-model-graphs sets out to achieve (by refactoring the current graph-generation code in neural-lam).
    • Who: leifdenby, joeloskarsson
    • Issues:
    • Relevant code: https://github.com/mllam/weather-model-graphs
    • external tool for creating, visualising and writing graphs for data-driven weather models

v0.4.0

Structured documentation

  • What: Set up a more robust and structured documentation solution

  • What: "Hello-world" example of complete data pre-processing, model training and evaluation

    • Why: There are quite a few steps to get started with training a model. Once the documentation is in place it would be really useful to write one complete example covering data pre-processing, graph construction, model training and model evaluation.
    • Who: sadamov
    • Issues:
  • What: Hard-constrain outputs for variables not taking values in all of $\mathbb{R}$.

    • Why: Variables like relative humidity take values in $[0,1]$ and total precipitation can not be negative. The neural network can technically output anything right now. These constraints should be ensured.
    • Who: simonkamuk
    • Reviewers: joeloskarsson, sadamov
    • Issues:
  • What: Separate interior state and boundary forcing to only predict state

    • Why: Establish a clear separation of the state in the interior region and the boundary forcing coming from outside the limited area. Do not waste computations on producing predictions for boundary that are never used.
    • Who: joeloskarsson
    • Issues:
  • What: Data loading from separate dataset for boundary

    • Why: We want to have different options for where to get the boundary forcing from. Also this boundary forcing data should use the datastore format.
    • Who: sadamov, leifdenby
    • Issues:
      • TBA

Improved Support for Multi GPU Training

v0.5.0

  • What: Perform standardization of data on GPU rather than CPU
    • Why: Performing the standardization requires entry-wise sums and multiplications of the exact type that GPUs are suitable for. Doing this is slow on CPU. It would be good to move these computations to GPU to reduce the overhead of the data loading.
    • Who: sadamov
    • Issues:

Other Issues

Probabilistic Model

Global Forecasting

  • What: Include features that enable running global forecast models using neural-LAM. Using mesh-graphs from GraphCast code in neural-LAM models. Area-weighted loss functions and metrics.

Simplified Plotting for Atmospheric Case Studies

Model class hierarchy refactoring

  • What: Refactor the model class hierarchy to be more understandable and easier to extend with new models.
    • Why: Currently the model class hierarchy is confusing and not very modular. Most things are handled by ARModel, rather than responsibilities being split up into logical components.
    • Who: joeloskarsson
    • Issue: https://github.com/mllam/neural-lam/issues/49