-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/bitmask missing values support. #22
base: develop
Are you sure you want to change the base?
Feature/bitmask missing values support. #22
Conversation
Carlos Pena De Pedro seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Hi @CarlosPenaDePedro , thanks for this PR. The reason matrices are copied and not directly modified on the presence of missing/masked values is because they are intended to be cached (and you'll see that a copy is followed by modification then destruction). The cacheable files are meant to represent a geometry relation between input and output. The reason the missing values corrections are named non-linear is deliberate: once you apply them, the resulting mulltiplication makes the interpolation operator non-linear in respect to the input/output geometries. So, this PR forces the cached files to depend on user-provided options, which is an indirect way to say field (missing/masked values) and the cache will be fundamentally broken. If a field has missing or masked values, this is not part of its geometry. You can solve this problem differently, but instead doing two multiplications, 1) as we already do disregarding the presence of missing/masked values, and 2) locally correcting the first (we don't have this functionality, yet). This solution is significantly more complicated and I was wishing to discuss beforehand with you (and Razvan if he's still available) but there was no timeline for this -- is there? Who is the ECMWF contact in ECMWF for these discussions? Let me know. |
The point of the PR is that in NEMO, the masked points (land points) are a static and predetermined feature of the grid geometry. Each eORCA grid and its vertical levels have a fixed set of grid points corresponding to land, regardless of the field or data. This makes the mask inherently part of the input/output geometry, not a dynamic property of the data. Therefore, incorporating the mask into the cached weight matrices aligns with their geometric purpose and does not introduce field dependency. Razvan is should now be available for discussion yes, and we have now a timeline correct me if I am wrong @raguridan but having this before the next D-suite in 6 month will be the goal?
I was not in contact with anyone at ECMWF regarding these discussions; my only contact has been through the emails we exchanged. |
Hi Pedro, Yes, we would like to have a solution for this before the next Destination Earth deliverable at the end of April/25. In the case of nemo (especially the sea ice related fields, which might have missing values), we found out that the used missing value is 0, which is also used to denote the fact that there is no ice at that grid point. Practically, this prevents us from distinguishing between a missing value and a point with no ice, meaning that the only missing points we can reliably determine as missing are the land points, which are captured in the land sea mask. Since the land sea mask does not change during a run, this is the reason why we can update the matrices and cache them afterwards, improving the runtime of the eORCA12 interpolation. |
This PR stems from DE IFS-NEMO experiments. In NEMO, the only missing points correspond to land points, which are masked. Our profiling revealed that the dynamic treatment of missing points, which includes copying the complete sparse matrix weights, was the main time-consuming aspect of the interpolation action. Since these missing points are known beforehand, the optimisation involves passing a boolean vector containing the static missing points as a parameter to mir and applying it to the matrix at creation time, enabling it to be cached.