Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add apply_neigborhood #215

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MartinSchobben
Copy link

@MartinSchobben MartinSchobben commented Jan 9, 2024

This is an initial attempt to implement apply_neighborhood while adhering to the openeo specifications. We have some difficulty understanding the proper implementation of the parameters size and overlap. More specifically we would like to understand better how those parameters translate to the stride of the rolling window/kernel. In our current understanding the overlap argument modifies the size argument, to obtain a modified kernel/window size.

In pseudo code:

$$ kernelsize = size + overlap * 2 $$

This definition implies that in order to obtain a rolling window/kernel with e.g., stride 1 and size 2 requires a 0-size and 1-overlap input for the openEO process. Or, for an instance with stride 1 and size 3, one needs negative size values, specifically -1 and 2 for the size and overlap parameters, respectively. This definition seems therefore impractical for stride 1 rolling windows/kernels. The unit tests of our PR include examples of these two instances in openEO with an input cube of size 4 by 4 filled with values 1 using the process sum.

By contrast, in xarray a stride 1-rolling kernel can be defined as follows:

import numpy as np
import xarray as xr

data = np.full((5, 5), 1)
input_data = xr.DataArray(data, dims=("x", "y"))
print(input_data)
<xarray.DataArray (x: 5, y: 5)>
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])
Dimensions without coordinates: x, y
binned_data = input_data.rolling(x=3, y=3, center=True). \
  construct({"x":"dx", "y":"dy"}, stride=1, fill_value=0)
output_cube = binned_data.reduce(np.sum, dim=("dx", "dy"))
print(output_cube)
<xarray.DataArray (x: 5, y: 5)>
array([[4, 6, 6, 6, 4],
       [6, 9, 9, 9, 6],
       [6, 9, 9, 9, 6],
       [6, 9, 9, 9, 6],
       [4, 6, 6, 6, 4]])
Dimensions without coordinates: x, y

It feels like we are missing some important aspect but we can't tell what.

See also this animation of an 3 by 3 kernel with stride 1 from an article on medium.com authored by Paul-Louis Pröve.

Paul-Louis Pröve, medium.com

Co-authored-by: @SwamyDev

Co-authored-by: SwamyDev
@clausmichele
Copy link
Member

clausmichele commented Jan 9, 2024

Tagging the VITO team @jdries @soxofaan @JeroenVerstraelen @VincentVerelst here, since they have it implemented and using it frequently.

As you noticed, I guess that the case with a kernel of size=2 and stride=1 is not covered by this definition, but I might be wrong.

For the scenario where your process takes as input a 3x3 patch, and computes one value out of it, you should set size=1 and overlap=1.

Anyway, I wait for some clarification from the others, since I also have some doubts on how the window/neighborhood "moves", so how is the stride being set.

Official docs are here: https://processes.openeo.org/#apply_neighborhood

@MartinSchobben
Copy link
Author

For the scenario where your process takes as input a 3x3 patch, and computes one value out of it, you should set size=1 and overlap=1.

As I understand it now this would yield a stride of 2.

@soxofaan
Copy link
Member

soxofaan commented Jan 9, 2024

I think the source of the confusion is that you should not aim for a "stride" in the apply_neighborhood process itself.

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks. For practical reasons there is no spatial equivalent for apply_dimension, it just wouldn't scale to let a user do something like apply_spatial and let them e.g. run a UDF on a complete spatial extent.
Instead, apply_neighborhood allows you to run on reasonable spatial chunks, with sizes in the order of (but not limited to) 128x128 or 512x512.

Within such a chunk you as a user are still allowed to implement some kind of operation that involves strides, but that should not be coupled to the apply_neighborhood size/overlap parameters as you try to do above. At most, the overlap should probably be large enough to cover your target stride, to ensure completeness of your results.

I hope this clarifies some things already

@MartinSchobben
Copy link
Author

Then I might have to rephrase a little. Given the xarray example above, which is a common spatial/image analysis operation, what would be the openEO equivalent for a rolling kernel?

@soxofaan
Copy link
Member

soxofaan commented Jan 9, 2024

Given the xarray example above, which is a common spatial/image analysis operation, what would be the openEO equivalent for a rolling kernel?

For that particular example: taking sum in 3x3 window and stepping 1px at a time, I would use apply_kernel.
But if you deviate from the simplicity of that convolution example (e.g. taking larger strides, or computing a non-linear aggregation instead of sum), then that doesn't work anymore.

@MartinSchobben
Copy link
Author

What further adds to the confusion is that the openEO cookbook says: a) that the function is a moving window, and b) that it allows supplying pre-defined processes like median, max , and sd to spatial, temporal, and spatiotemporal "neighborhoods" (which can be interpreted as window in the context). This reads like apply_neighborhood behaves similar as e.g. xarray.DataArray.rolling().

@soxofaan
Copy link
Member

a) that the function is a moving window, and b) that it allows supplying pre-defined processes

that b part comes from the following I guess, but I'm not sure where that a part comes from

https://github.com/Open-EO/openeo.org/blame/10b68cee63b1dac17d8a4902a02f325563be6743/documentation/1.0/cookbook/README.md#L1198

This cookbook snippet is indeed misleading.
Practically speaking, I think only run_udf is the only useful pre-defined process you can directly pass to apply_neighborhood. Theoretically you could also pass processes like apply_kernel, atmospheric_correction, ard_..., ndvi, mask, ... but that makes no sense as it will be more efficient to apply them directly instead of going through apply_neighborhood

@MartinSchobben
Copy link
Author

MartinSchobben commented Feb 16, 2024

I just stumbled across this example in the introduction of openEO datacubes.

https://openeo.org/documentation/1.0/datacubes.html

Also here it is suggested that apply_neighborhood represents a rolling window. Furthermore, the example also shows a temporal smoothing on a datacube (as opposed to the example here). This would contradict that the process is intended for spatial chunks only, as stated above:

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks.

@soxofaan
Copy link
Member

You can indeed use apply_neighborhood for doing a rolling window, indeed also along temporal dimension. But I think that the explanation at https://openeo.org/documentation/1.0/datacubes.html is misleading, e.g. it uses "window size=3" while the apply_neighborhood arguments corresponding with the example would be size=1 and overlap=1.

This would contradict that the process is intended for spatial chunks only, as stated above:

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks.

I didn't intend to say that it is for spatial chunking only. Historically, spatial chunking was the main goal, but temporal chunking was added as a logical generalization.

Going back to the beginning of this thread. If you look at apply_neighborhood with a "rolling window" mindset, I think that the size parameter corresponds to the stride of the rolling window. The overlap parameters adds additional padding data to that.

I hope this clarifies some things already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants