Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply function to points within circular neighborhood #941

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

ahijevyc
Copy link
Collaborator

@ahijevyc ahijevyc commented Sep 9, 2024

Apply a neighborhood filter within a circular radius r to a UxDataset or UxDataArray.

Issue #930

Overview

This is kind of like uxarray.UxDataArray.inverse_distance_weighted_remap , but the neighborhood is defined by distance, not a number of nearest neighbors. This is ideally suited for a variable resolution mesh, in which a constant of neighbors doesn't have a constant sized neighborhood. Another difference is that this neighborhood filter does not weight data by inverse distance.

Just like uxarray.UxDataArray.subset.bounding_circle this function uses ball_tree.query_radius to select grid elements in a circular neighborhood, but this function finds the neighborhood for all elements in grid, not just one center_coordinate.

The filter function func may be a user-defined function, but uses np.mean by default. It could be min, max, np.median. It can even use functions that require additional arguments, like np.percentile if you supply the argument(s) with functools.partial (see below)

Expected Usage

from functools import partial
import numpy as np
import uxarray

grid_path = "/glade/campaign/mmm/wmr/weiwang/cps/irma3/2020/tk707_conus/init.nc"
data_path = "/glade/campaign/mmm/wmr/weiwang/cps/irma3/mp6/tk707/diag.2017-09-07_09.00.00.nc"
uxds = uxarray.open_mfdataset(
    grid_path,
    data_path
)

# Trim domain
lon_bounds = (-74, -64)
lat_bounds = (18, 24)
uxda = uxds["refl10cm_max"].isel(Time=0).subset.bounding_box(lon_bounds, lat_bounds)

# this is how you use this function to smooth with 0.25-deg filter.
uxda_mean = uxda.neighborhood_filter(func=np.mean, r=0.25)


# this is another way to use this function with np.percentile
uxda_max = uxda.neighborhood_filter(func=partial(np.percentile, q=90), r=0.25)

(uxda.plot.rasterize() + uxda_mean.plot.rasterize() + uxda_max.plot.rasterize()).cols(1)

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have been added to all new functions
  • Docstrings have updated with any function changes
  • Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • User functions have been added to docs/user_api/index.rst

Examples

  • Any new notebook examples added to docs/examples/ folder
  • Clear the output of all cells before committing
  • New notebook files added to docs/examples.rst toctree
  • New notebook files added to new entry in docs/gallery.yml with appropriate thumbnail photo in docs/_static/thumbnails/

@ahijevyc ahijevyc added the new feature New feature or request label Sep 9, 2024
@ahijevyc ahijevyc self-assigned this Sep 9, 2024
@ahijevyc ahijevyc mentioned this pull request Sep 9, 2024
14 tasks
Comment on lines +1138 to +1139
# face centers, edge centers or nodes.
tree = self.uxgrid.get_ball_tree(coordinates=data_mapping, reconstruct=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronzedwick

We should probably fix this logic in get_ball_tree(), since we shouldn't need to manually set reconstruct=False

        if self._ball_tree is None or reconstruct:
            self._ball_tree = BallTree(
                self,
                coordinates=coordinates,
                distance_metric=distance_metric,
                coordinate_system=coordinate_system,
                reconstruct=reconstruct,
            )
        else:
            if coordinates != self._ball_tree._coordinates:
                self._ball_tree.coordinates = coordinates

The coordinates != self._ball_tree._coordinates check should be included in the first if

Copy link
Collaborator Author

@ahijevyc ahijevyc Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. So, move the coordinates check to the if-clause like this?

                if (
                    self._ball_tree is None
                    or coordinates != self._ball_tree._coordinates
                    or reconstruct
                ):

                    self._ball_tree = BallTree(
                        self,
                        coordinates=coordinates,
                        distance_metric=distance_metric,
                        coordinate_system=coordinate_system,
                        reconstruct=reconstruct,
                    )

What if the coordinate_system is different? Would that also require a newly constructed tree?

Copy link
Collaborator Author

@ahijevyc ahijevyc Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever logic is fixed in Grid.get_ball_tree should also be applied to Grid.get_kdtree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking coordinate system also (coordinate_system is not a hidden variable of _ball_tree; it has no underscore):

                if (
                    self._ball_tree is None
                    or coordinates != self._ball_tree._coordinates
                    or coordinate_system != self._ball_tree.coordinate_system
                    or reconstruct
                ):

                    self._ball_tree = BallTree(
                        self,
                        coordinates=coordinates,
                        distance_metric=distance_metric,
                        coordinate_system=coordinate_system,
                        reconstruct=reconstruct,
                    )

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments:

@@ -1104,3 +1103,116 @@ def _slice_from_grid(self, sliced_grid):
dims=self.dims,
attrs=self.attrs,
)

def neighborhood_filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks great! May we move the bulk of the logic into the uxarray.grid.neighbors module and call that helper from here?

We can keep the data-mapping checks here, and anything related to constructing and returining the final data array but the bulk of the computations would go inside a helper in the module mentioned above.

Copy link
Collaborator Author

@ahijevyc ahijevyc Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about how to do that, but I am happy to defer to you.

Comment on lines +1065 to +1115

coordinate_system = tree.coordinate_system

if coordinate_system == "spherical":
if data_mapping == "nodes":
lon, lat = (
self.uxgrid.node_lon.values,
self.uxgrid.node_lat.values,
)
elif data_mapping == "face centers":
lon, lat = (
self.uxgrid.face_lon.values,
self.uxgrid.face_lat.values,
)
elif data_mapping == "edge centers":
lon, lat = (
self.uxgrid.edge_lon.values,
self.uxgrid.edge_lat.values,
)
else:
raise ValueError(
f"Invalid data_mapping. Expected 'nodes', 'edge centers', or 'face centers', "
f"but received: {data_mapping}"
)

dest_coords = np.vstack((lon, lat)).T

elif coordinate_system == "cartesian":
if data_mapping == "nodes":
x, y, z = (
self.uxgrid.node_x.values,
self.uxgrid.node_y.values,
self.uxgrid.node_z.values,
)
elif data_mapping == "face centers":
x, y, z = (
self.uxgrid.face_x.values,
self.uxgrid.face_y.values,
self.uxgrid.face_z.values,
)
elif data_mapping == "edge centers":
x, y, z = (
self.uxgrid.edge_x.values,
self.uxgrid.edge_y.values,
self.uxgrid.edge_z.values,
)
else:
raise ValueError(
f"Invalid data_mapping. Expected 'nodes', 'edge centers', or 'face centers', "
f"but received: {data_mapping}"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use #974 's new remap.utils._remap_grid_parse instead of this code block.

Kept neighborhood and dual additions
@philipc2
Copy link
Member

philipc2 commented Mar 7, 2025

HI @ahijevyc

Apologies for not getting to this PR earlier.

Looking at the implementation here, it looks great. It does however bring to light a possible need for us to consider a better, more streamlined, approach to handling these types of groupings and then applying some function on the result.

I mention this because of our Topological Aggregations. For this family of functions, we have distinct methods (i.e. topological_mean()), which looking back at doesn't seem like the preferred approach, especially if we plan to implement groupings like the neighborhood one and perhaps other spatial ones.

Very generally speaking, these functions essentially:

  1. Group unstructured grid elements based on some condition/algorithm. Here we use the KD/BallTree to determine the candidate elements, while in the topological aggregations we use the connectivity information
  2. Apply some function to the grouping (i.e. mean())
  3. Store the results back on the unstructured grid element (node, edge, or face)

I wonder if this would be a good opportunity to extend the inherited .groupby() method from Xarray to support these spatial groupings.

I'm not sure of calling these approaches "kernels" is appropriate, but for the sake of this example, we could provide spatial kernels the user could pass into groupby() and then perform aggregations directly on the result. This feels much more in line with Xarray's design philosophy.

# radial neighborhood of r=0.25
uxds['t2m'].groupby(kernel=ux.BoundingCircle(r=0.25)).mean()

# 2 deg by 2 deg bounding box 
uxds['t2m'].groupby(kernel=ux.BoundingBox(dlon=2, dlat=2))

# group the nodes that surround each face and find the maximum
uxds['node_centered_var'].groupby(kernel=ux.FaceNode()).max()

# this is equivalent to the following in the current release
uxds['node_centered_var'].topological_max(destination='face')

I'll ping @aaronzedwick and @erogluorhan for their thoughts on this. I personally really like the design above and think that it aligns well with the overall design.

@aaronzedwick
Copy link
Member

aaronzedwick commented Mar 10, 2025

HI @ahijevyc

Apologies for not getting to this PR earlier.

Looking at the implementation here, it looks great. It does however bring to light a possible need for us to consider a better, more streamlined, approach to handling these types of groupings and then applying some function on the result.

I mention this because of our Topological Aggregations. For this family of functions, we have distinct methods (i.e. topological_mean()), which looking back at doesn't seem like the preferred approach, especially if we plan to implement groupings like the neighborhood one and perhaps other spatial ones.

Very generally speaking, these functions essentially:

  1. Group unstructured grid elements based on some condition/algorithm. Here we use the KD/BallTree to determine the candidate elements, while in the topological aggregations we use the connectivity information
  2. Apply some function to the grouping (i.e. mean())
  3. Store the results back on the unstructured grid element (node, edge, or face)

I wonder if this would be a good opportunity to extend the inherited .groupby() method from Xarray to support these spatial groupings.

I'm not sure of calling these approaches "kernels" is appropriate, but for the sake of this example, we could provide spatial kernels the user could pass into groupby() and then perform aggregations directly on the result. This feels much more in line with Xarray's design philosophy.

# radial neighborhood of r=0.25
uxds['t2m'].groupby(kernel=ux.BoundingCircle(r=0.25)).mean()

# 2 deg by 2 deg bounding box 
uxds['t2m'].groupby(kernel=ux.BoundingBox(dlon=2, dlat=2))

# group the nodes that surround each face and find the maximum
uxds['node_centered_var'].groupby(kernel=ux.FaceNode()).max()

# this is equivalent to the following in the current release
uxds['node_centered_var'].topological_max(destination='face')

I'll ping @aaronzedwick and @erogluorhan for their thoughts on this. I personally really like the design above and think that it aligns well with the overall design.

That is interesting. You suggesting changing the way we do aggregations entirely? Then this would affect the reduction PR I am working on then. Perhaps this PR could implement that change if you wish. I am fine with this, if you want to, it sounds like it would be intuitive.

@philipc2
Copy link
Member

You suggesting changing the way we do aggregations entirely?

We should be changing the API for it, since the current one is not sustainable if we plan to implement many times of spatial grouping methods.

The underlying functionality would remain mostly unchanged.

Perhaps this PR could implement that change if you wish.

If we do decide to do this, I'll open up a separate PR starting with the topological aggregations.

@aaronzedwick
Copy link
Member

You suggesting changing the way we do aggregations entirely?

We should be changing the API for it, since the current one is not sustainable if we plan to implement many times of spatial grouping methods.

The underlying functionality would remain mostly unchanged.

Perhaps this PR could implement that change if you wish.

If we do decide to do this, I'll open up a separate PR starting with the topological aggregations.

So would the reductions PR be obsolete?

@philipc2
Copy link
Member

You suggesting changing the way we do aggregations entirely?

We should be changing the API for it, since the current one is not sustainable if we plan to implement many times of spatial grouping methods.
The underlying functionality would remain mostly unchanged.

Perhaps this PR could implement that change if you wish.

If we do decide to do this, I'll open up a separate PR starting with the topological aggregations.

So would the reductions PR be obsolete?

No. The underlying implementation would remain the same, since we would still need those implemented.

This would just provide a different interface for it, with a more "Xarray-like" interface.

@aaronzedwick
Copy link
Member

You suggesting changing the way we do aggregations entirely?

We should be changing the API for it, since the current one is not sustainable if we plan to implement many times of spatial grouping methods.
The underlying functionality would remain mostly unchanged.

Perhaps this PR could implement that change if you wish.

If we do decide to do this, I'll open up a separate PR starting with the topological aggregations.

So would the reductions PR be obsolete?

No. The underlying implementation would remain the same, since we would still need those implemented.

This would just provide a different interface for it, with a more "Xarray-like" interface.

Ah, okay, I see. That makes sense, thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply a neighborhood filter with radius r to all elements of UxDataArray
3 participants