Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/bitmask missing values support. #22

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

CarlosPenaDePedro
Copy link

This PR stems from DE IFS-NEMO experiments. In NEMO, the only missing points correspond to land points, which are masked. Our profiling revealed that the dynamic treatment of missing points, which includes copying the complete sparse matrix weights, was the main time-consuming aspect of the interpolation action. Since these missing points are known beforehand, the optimisation involves passing a boolean vector containing the static missing points as a parameter to mir and applying it to the matrix at creation time, enabling it to be cached.

@FussyDuck
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Carlos Pena De Pedro seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@pmaciel
Copy link
Member

pmaciel commented Nov 21, 2024

Hi @CarlosPenaDePedro , thanks for this PR. The reason matrices are copied and not directly modified on the presence of missing/masked values is because they are intended to be cached (and you'll see that a copy is followed by modification then destruction). The cacheable files are meant to represent a geometry relation between input and output.

The reason the missing values corrections are named non-linear is deliberate: once you apply them, the resulting mulltiplication makes the interpolation operator non-linear in respect to the input/output geometries. So, this PR forces the cached files to depend on user-provided options, which is an indirect way to say field (missing/masked values) and the cache will be fundamentally broken.

If a field has missing or masked values, this is not part of its geometry. You can solve this problem differently, but instead doing two multiplications, 1) as we already do disregarding the presence of missing/masked values, and 2) locally correcting the first (we don't have this functionality, yet). This solution is significantly more complicated and I was wishing to discuss beforehand with you (and Razvan if he's still available) but there was no timeline for this -- is there?

Who is the ECMWF contact in ECMWF for these discussions? Let me know.

@CarlosPenaDePedro
Copy link
Author

CarlosPenaDePedro commented Nov 21, 2024

If a field has missing or masked values, this is not part of its geometry. You can solve this problem differently, but instead doing two multiplications, 1) as we already do disregarding the presence of missing/masked values, and 2) locally correcting the first (we don't have this functionality, yet). Razvan should now be available for discussion, and we now have a timeline. Correct me if I'm wrong, @raguridan, but the goal would be to have this ready before the next D-Suite release in six months?

The point of the PR is that in NEMO, the masked points (land points) are a static and predetermined feature of the grid geometry. Each eORCA grid and its vertical levels have a fixed set of grid points corresponding to land, regardless of the field or data. This makes the mask inherently part of the input/output geometry, not a dynamic property of the data. Therefore, incorporating the mask into the cached weight matrices aligns with their geometric purpose and does not introduce field dependency. Razvan is should now be available for discussion yes, and we have now a timeline correct me if I am wrong @raguridan but having this before the next D-suite in 6 month will be the goal?

Who is the ECMWF contact in ECMWF for these discussions? Let me know.

I was not in contact with anyone at ECMWF regarding these discussions; my only contact has been through the emails we exchanged.

@raguridan-bsc
Copy link

Hi Pedro,

Yes, we would like to have a solution for this before the next Destination Earth deliverable at the end of April/25.

In the case of nemo (especially the sea ice related fields, which might have missing values), we found out that the used missing value is 0, which is also used to denote the fact that there is no ice at that grid point.

Practically, this prevents us from distinguishing between a missing value and a point with no ice, meaning that the only missing points we can reliably determine as missing are the land points, which are captured in the land sea mask.

Since the land sea mask does not change during a run, this is the reason why we can update the matrices and cache them afterwards, improving the runtime of the eORCA12 interpolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants