Multi-objective by variable #63

OpheliaMiralles · 2025-01-07T14:10:05Z

Is your feature request related to a problem? Please describe.

The idea would be useful for nowcasting, but also any data assimilation approach. The loss would be computed by variable, e.g. for precipitation it could be some weighted sum looking like the below, with $L_1$ and $L_2$ which could be losses related to spatial patterns, and $L_3$ a variant of RMSE for point precision. The weights $w_1$, $w_2$ and $w_3$ could be varying with the rollout or target lead time (e.g. smaller lead time puts more weight on observation, higher put more weights on NWP).

Of course there are issues of missing values in some of the sources (e.g. station and radar) and it could be handled by an imputer, or other. It is a raw idea for data assimilation, and this issue is an invitation to brainstorm.

$$Loss_{tp} = w_1 L_1\left(y_{radar}, \hat{y}\right)+ w_2 L_2\left(y_{nwp},\hat{y}\right)+ w_3 L_3\left(y_{station}, \hat{y}\right)$$

Describe the solution you'd like

A kind of combined loss by variables. This would mean one output is to be linked with several inputs/targets, and the loss associated with the source would be an info somewhere. This needs to be somehow in the config, e.g. like the current combined loss but as a dictionary by variables. Then the default loss mask per variable would be 0 and explicitly set to 1 for variables of the relevant source.

The current implementation of the tensor.py from anemoi.models.data_indices does not provide enough flexibility to add target variables used as forcing, in the loss but not outputed by the model. A re-writing is partly necessary. Also, it seems we go through _only and _removed mainly to recompute information we already had in the ModelIndex class, which I do not understand. WIP on my side.
A change to CombinedLoss had to be implemented for it to work, whereas the loss is documented online. We need to be careful on the testing.
For the weights to be dependent on lead time, batch context should be provided to the loss somehow, which for now is not the case.
Filtering by variable has to be implemented in the loss. This is WIP on my side.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

MeteoSwiss

HCookie · 2025-01-14T16:37:11Z

One way to do the weighting per variable for the different losses would be to allow the loss_weights in the CombinedLoss to be a tensor of size Variable. This could then allow different weightings per variable per loss.

OpheliaMiralles · 2025-01-24T10:43:52Z

Not really... Because each loss links one predicted variables to several target sources for this variable, I thus need to write a FilteringLossWrapper to filter the batch with relevant variables before calling each loss function of the CombinedLoss. I have a working version, which involves adding a new kind of indices to index collection. I am waiting for the CombinedLoss discussion to be solved before opening a PR on the topic. The loss weights could, though, include batch context to provide information on the predicted lead time if we wanted to model the decay of importance of observations as the lead time increases.

OpheliaMiralles added enhancement New feature or request training labels Jan 7, 2025

OpheliaMiralles self-assigned this Jan 13, 2025

anaprietonem added this to Anemoi-dev Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-objective by variable #63

Multi-objective by variable #63

OpheliaMiralles commented Jan 7, 2025 •

edited

Loading

HCookie commented Jan 14, 2025

OpheliaMiralles commented Jan 24, 2025

Multi-objective by variable #63

Multi-objective by variable #63

Comments

OpheliaMiralles commented Jan 7, 2025 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

HCookie commented Jan 14, 2025

OpheliaMiralles commented Jan 24, 2025

OpheliaMiralles commented Jan 7, 2025 •

edited

Loading