You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The idea would be useful for nowcasting, but also any data assimilation approach. The loss would be computed by variable, e.g. for precipitation it could be some weighted sum looking like the below, with $L_1$ and $L_2$ which could be losses related to spatial patterns, and $L_3$ a variant of RMSE for point precision. The weights $w_1$, $w_2$ and $w_3$ could be varying with the rollout or target lead time (e.g. smaller lead time puts more weight on observation, higher put more weights on NWP).
Of course there are issues of missing values in some of the sources (e.g. station and radar) and it could be handled by an imputer, or other. It is a raw idea for data assimilation, and this issue is an invitation to brainstorm.
A kind of combined loss by variables. This would mean one output is to be linked with several inputs/targets, and the loss associated with the source would be an info somewhere. This needs to be somehow in the config, e.g. like the current combined loss but as a dictionary by variables. Then the default loss mask per variable would be 0 and explicitly set to 1 for variables of the relevant source.
The current implementation of the tensor.py from anemoi.models.data_indices does not provide enough flexibility to add target variables used as forcing, in the loss but not outputed by the model. A re-writing is partly necessary. Also, it seems we go through _only and _removed mainly to recompute information we already had in the ModelIndex class, which I do not understand. WIP on my side.
A change to CombinedLoss had to be implemented for it to work, whereas the loss is documented online. We need to be careful on the testing.
For the weights to be dependent on lead time, batch context should be provided to the loss somehow, which for now is not the case.
Filtering by variable has to be implemented in the loss. This is WIP on my side.
Describe alternatives you've considered
No response
Additional context
No response
Organisation
MeteoSwiss
The text was updated successfully, but these errors were encountered:
One way to do the weighting per variable for the different losses would be to allow the loss_weights in the CombinedLoss to be a tensor of size Variable. This could then allow different weightings per variable per loss.
Not really... Because each loss links one predicted variables to several target sources for this variable, I thus need to write a FilteringLossWrapper to filter the batch with relevant variables before calling each loss function of the CombinedLoss. I have a working version, which involves adding a new kind of indices to index collection. I am waiting for the CombinedLoss discussion to be solved before opening a PR on the topic. The loss weights could, though, include batch context to provide information on the predicted lead time if we wanted to model the decay of importance of observations as the lead time increases.
Is your feature request related to a problem? Please describe.
The idea would be useful for nowcasting, but also any data assimilation approach. The loss would be computed by variable, e.g. for precipitation it could be some weighted sum looking like the below, with$L_1$ and $L_2$ which could be losses related to spatial patterns, and $L_3$ a variant of RMSE for point precision. The weights $w_1$ , $w_2$ and $w_3$ could be varying with the rollout or target lead time (e.g. smaller lead time puts more weight on observation, higher put more weights on NWP).
Of course there are issues of missing values in some of the sources (e.g. station and radar) and it could be handled by an imputer, or other. It is a raw idea for data assimilation, and this issue is an invitation to brainstorm.
Describe the solution you'd like
A kind of combined loss by variables. This would mean one output is to be linked with several inputs/targets, and the loss associated with the source would be an info somewhere. This needs to be somehow in the config, e.g. like the current combined loss but as a dictionary by variables. Then the default loss mask per variable would be 0 and explicitly set to 1 for variables of the relevant source.
Describe alternatives you've considered
No response
Additional context
No response
Organisation
MeteoSwiss
The text was updated successfully, but these errors were encountered: