add apply_neigborhood #215

MartinSchobben · 2024-01-09T11:02:31Z

This is an initial attempt to implement apply_neighborhood while adhering to the openeo specifications. We have some difficulty understanding the proper implementation of the parameters size and overlap. More specifically we would like to understand better how those parameters translate to the stride of the rolling window/kernel. In our current understanding the overlap argument modifies the size argument, to obtain a modified kernel/window size.

In pseudo code:

$$ kernelsize = size + overlap * 2 $$

This definition implies that in order to obtain a rolling window/kernel with e.g., stride 1 and size 2 requires a 0-size and 1-overlap input for the openEO process. Or, for an instance with stride 1 and size 3, one needs negative size values, specifically -1 and 2 for the size and overlap parameters, respectively. This definition seems therefore impractical for stride 1 rolling windows/kernels. The unit tests of our PR include examples of these two instances in openEO with an input cube of size 4 by 4 filled with values 1 using the process sum.

By contrast, in xarray a stride 1-rolling kernel can be defined as follows:

import numpy as np
import xarray as xr

data = np.full((5, 5), 1)
input_data = xr.DataArray(data, dims=("x", "y"))
print(input_data)

<xarray.DataArray (x: 5, y: 5)>
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])
Dimensions without coordinates: x, y

binned_data = input_data.rolling(x=3, y=3, center=True). \
  construct({"x":"dx", "y":"dy"}, stride=1, fill_value=0)
output_cube = binned_data.reduce(np.sum, dim=("dx", "dy"))
print(output_cube)

<xarray.DataArray (x: 5, y: 5)>
array([[4, 6, 6, 6, 4],
       [6, 9, 9, 9, 6],
       [6, 9, 9, 9, 6],
       [6, 9, 9, 9, 6],
       [4, 6, 6, 6, 4]])
Dimensions without coordinates: x, y

It feels like we are missing some important aspect but we can't tell what.

See also this animation of an 3 by 3 kernel with stride 1 from an article on medium.com authored by Paul-Louis Pröve.

Co-authored-by: @SwamyDev

Co-authored-by: SwamyDev

clausmichele · 2024-01-09T13:35:38Z

Tagging the VITO team @jdries @soxofaan @JeroenVerstraelen @VincentVerelst here, since they have it implemented and using it frequently.

As you noticed, I guess that the case with a kernel of size=2 and stride=1 is not covered by this definition, but I might be wrong.

For the scenario where your process takes as input a 3x3 patch, and computes one value out of it, you should set size=1 and overlap=1.

Anyway, I wait for some clarification from the others, since I also have some doubts on how the window/neighborhood "moves", so how is the stride being set.

Official docs are here: https://processes.openeo.org/#apply_neighborhood

MartinSchobben · 2024-01-09T13:46:42Z

For the scenario where your process takes as input a 3x3 patch, and computes one value out of it, you should set size=1 and overlap=1.

As I understand it now this would yield a stride of 2.

soxofaan · 2024-01-09T14:48:21Z

I think the source of the confusion is that you should not aim for a "stride" in the apply_neighborhood process itself.

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks. For practical reasons there is no spatial equivalent for apply_dimension, it just wouldn't scale to let a user do something like apply_spatial and let them e.g. run a UDF on a complete spatial extent.
Instead, apply_neighborhood allows you to run on reasonable spatial chunks, with sizes in the order of (but not limited to) 128x128 or 512x512.

Within such a chunk you as a user are still allowed to implement some kind of operation that involves strides, but that should not be coupled to the apply_neighborhood size/overlap parameters as you try to do above. At most, the overlap should probably be large enough to cover your target stride, to ensure completeness of your results.

I hope this clarifies some things already

MartinSchobben · 2024-01-09T14:55:32Z

Then I might have to rephrase a little. Given the xarray example above, which is a common spatial/image analysis operation, what would be the openEO equivalent for a rolling kernel?

soxofaan · 2024-01-09T15:34:17Z

Given the xarray example above, which is a common spatial/image analysis operation, what would be the openEO equivalent for a rolling kernel?

For that particular example: taking sum in 3x3 window and stepping 1px at a time, I would use apply_kernel.
But if you deviate from the simplicity of that convolution example (e.g. taking larger strides, or computing a non-linear aggregation instead of sum), then that doesn't work anymore.

MartinSchobben · 2024-01-09T19:41:55Z

What further adds to the confusion is that the openEO cookbook says: a) that the function is a moving window, and b) that it allows supplying pre-defined processes like median, max , and sd to spatial, temporal, and spatiotemporal "neighborhoods" (which can be interpreted as window in the context). This reads like apply_neighborhood behaves similar as e.g. xarray.DataArray.rolling().

soxofaan · 2024-01-10T08:25:57Z

a) that the function is a moving window, and b) that it allows supplying pre-defined processes

that b part comes from the following I guess, but I'm not sure where that a part comes from

https://github.com/Open-EO/openeo.org/blame/10b68cee63b1dac17d8a4902a02f325563be6743/documentation/1.0/cookbook/README.md#L1198

This cookbook snippet is indeed misleading.
Practically speaking, I think only run_udf is the only useful pre-defined process you can directly pass to apply_neighborhood. Theoretically you could also pass processes like apply_kernel, atmospheric_correction, ard_..., ndvi, mask, ... but that makes no sense as it will be more efficient to apply them directly instead of going through apply_neighborhood

MartinSchobben · 2024-02-16T13:30:50Z

I just stumbled across this example in the introduction of openEO datacubes.

https://openeo.org/documentation/1.0/datacubes.html

Also here it is suggested that apply_neighborhood represents a rolling window. Furthermore, the example also shows a temporal smoothing on a datacube (as opposed to the example here). This would contradict that the process is intended for spatial chunks only, as stated above:

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks.

soxofaan · 2024-02-19T14:41:53Z

You can indeed use apply_neighborhood for doing a rolling window, indeed also along temporal dimension. But I think that the explanation at https://openeo.org/documentation/1.0/datacubes.html is misleading, e.g. it uses "window size=3" while the apply_neighborhood arguments corresponding with the example would be size=1 and overlap=1.

This would contradict that the process is intended for spatial chunks only, as stated above:

apply_neighborhood is just process in the apply_* family of openeo processes that allows processing on spatial chunks.

I didn't intend to say that it is for spatial chunking only. Historically, spatial chunking was the main goal, but temporal chunking was added as a logical generalization.

Going back to the beginning of this thread. If you look at apply_neighborhood with a "rolling window" mindset, I think that the size parameter corresponds to the stride of the rolling window. The overlap parameters adds additional padding data to that.

I hope this clarifies some things already

add apply_neigborhood

79fcb30

Co-authored-by: SwamyDev

MartinSchobben mentioned this pull request Jan 9, 2024

WIP: masking outliers + conflicting distributions + incidence angles + high uncertainty classification + majority filter interTwin-eu/openeo-flood-mapper-local#6

Merged

soxofaan mentioned this pull request Jan 10, 2024

cookbook: misleading info on apply_neighborhood Open-EO/openeo.org#84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add apply_neigborhood #215

add apply_neigborhood #215

MartinSchobben commented Jan 9, 2024 •

edited

Loading

clausmichele commented Jan 9, 2024 •

edited

Loading

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 9, 2024

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 9, 2024

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 10, 2024

MartinSchobben commented Feb 16, 2024 •

edited

Loading

soxofaan commented Feb 19, 2024

add apply_neigborhood #215

Are you sure you want to change the base?

add apply_neigborhood #215

Conversation

MartinSchobben commented Jan 9, 2024 • edited Loading

clausmichele commented Jan 9, 2024 • edited Loading

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 9, 2024

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 9, 2024

MartinSchobben commented Jan 9, 2024

soxofaan commented Jan 10, 2024

MartinSchobben commented Feb 16, 2024 • edited Loading

soxofaan commented Feb 19, 2024

MartinSchobben commented Jan 9, 2024 •

edited

Loading

clausmichele commented Jan 9, 2024 •

edited

Loading

MartinSchobben commented Feb 16, 2024 •

edited

Loading