Feature/bitmask missing values support. #22

CarlosPenaDePedro · 2024-11-21T11:38:56Z

This PR stems from DE IFS-NEMO experiments. In NEMO, the only missing points correspond to land points, which are masked. Our profiling revealed that the dynamic treatment of missing points, which includes copying the complete sparse matrix weights, was the main time-consuming aspect of the interpolation action. Since these missing points are known beforehand, the optimisation involves passing a boolean vector containing the static missing points as a parameter to mir and applying it to the matrix at creation time, enabling it to be cached.

…ssing_values-support_simple

FussyDuck · 2024-11-21T11:39:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Carlos Pena De Pedro seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

pmaciel · 2024-11-21T11:58:02Z

Hi @CarlosPenaDePedro , thanks for this PR. The reason matrices are copied and not directly modified on the presence of missing/masked values is because they are intended to be cached (and you'll see that a copy is followed by modification then destruction). The cacheable files are meant to represent a geometry relation between input and output.

The reason the missing values corrections are named non-linear is deliberate: once you apply them, the resulting mulltiplication makes the interpolation operator non-linear in respect to the input/output geometries. So, this PR forces the cached files to depend on user-provided options, which is an indirect way to say field (missing/masked values) and the cache will be fundamentally broken.

If a field has missing or masked values, this is not part of its geometry. You can solve this problem differently, but instead doing two multiplications, 1) as we already do disregarding the presence of missing/masked values, and 2) locally correcting the first (we don't have this functionality, yet). This solution is significantly more complicated and I was wishing to discuss beforehand with you (and Razvan if he's still available) but there was no timeline for this -- is there?

Who is the ECMWF contact in ECMWF for these discussions? Let me know.

CarlosPenaDePedro · 2024-11-21T15:22:42Z

If a field has missing or masked values, this is not part of its geometry. You can solve this problem differently, but instead doing two multiplications, 1) as we already do disregarding the presence of missing/masked values, and 2) locally correcting the first (we don't have this functionality, yet). Razvan should now be available for discussion, and we now have a timeline. Correct me if I'm wrong, @raguridan, but the goal would be to have this ready before the next D-Suite release in six months?

The point of the PR is that in NEMO, the masked points (land points) are a static and predetermined feature of the grid geometry. Each eORCA grid and its vertical levels have a fixed set of grid points corresponding to land, regardless of the field or data. This makes the mask inherently part of the input/output geometry, not a dynamic property of the data. Therefore, incorporating the mask into the cached weight matrices aligns with their geometric purpose and does not introduce field dependency. Razvan is should now be available for discussion yes, and we have now a timeline correct me if I am wrong @raguridan but having this before the next D-suite in 6 month will be the goal?

Who is the ECMWF contact in ECMWF for these discussions? Let me know.

I was not in contact with anyone at ECMWF regarding these discussions; my only contact has been through the emails we exchanged.

raguridan-bsc · 2024-11-22T12:08:54Z

Hi Pedro,

Yes, we would like to have a solution for this before the next Destination Earth deliverable at the end of April/25.

In the case of nemo (especially the sea ice related fields, which might have missing values), we found out that the used missing value is 0, which is also used to denote the fact that there is no ice at that grid point.

Practically, this prevents us from distinguishing between a missing value and a point with no ice, meaning that the only missing points we can reliably determine as missing are the land points, which are captured in the land sea mask.

Since the land sea mask does not change during a run, this is the reason why we can update the matrices and cache them afterwards, improving the runtime of the eORCA12 interpolation.

CarlosPenaDePedro added 6 commits November 4, 2024 19:27

Added bitmask option

a040c31

Completed and fixed this implementation

dcb0430

Sparse Matrix access improved

9ff1663

Added IMM mask default to false

1ea9c33

Tidy up and some fixes

79b06ba

Merge remote-tracking branch 'origin/develop' into feature/bitmask-mi…

0cb38a1

…ssing_values-support_simple

github-actions bot added the contributor label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/bitmask missing values support. #22

Feature/bitmask missing values support. #22

CarlosPenaDePedro commented Nov 21, 2024

FussyDuck commented Nov 21, 2024

pmaciel commented Nov 21, 2024

CarlosPenaDePedro commented Nov 21, 2024 •

edited

Loading

raguridan-bsc commented Nov 22, 2024

Feature/bitmask missing values support. #22

Are you sure you want to change the base?

Feature/bitmask missing values support. #22

Conversation

CarlosPenaDePedro commented Nov 21, 2024

FussyDuck commented Nov 21, 2024

pmaciel commented Nov 21, 2024

CarlosPenaDePedro commented Nov 21, 2024 • edited Loading

raguridan-bsc commented Nov 22, 2024

CarlosPenaDePedro commented Nov 21, 2024 •

edited

Loading