Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregate_spatial: use 'id' property of input geojson to name output features #74

Closed
jdries opened this issue Sep 7, 2022 · 7 comments
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Sep 7, 2022

when writing netcdf output for aggregate_spatial, we now name the features like['feature_0','feature_1',....]
Our users would like us to use the feature id of the input geojson to name these features, to more easily link back a timeseries to the input.

This relates to support for vector cubes/feature collections, which also requires us to preserve feature properties throughout the processing.

@jdries
Copy link
Contributor Author

jdries commented Sep 13, 2022

Loading geometry through specific process should trigger usage of new-style vector cube, which retains these features id's.

@soxofaan
Copy link
Member

FYI: see EP-3981 and Open-EO/openeo-python-driver#114 for my initial implementation of VectorCube support (geopandas based)

basic test that illustrates workflow: https://github.com/Open-EO/openeo-python-driver/blob/master/tests/test_views_execute.py#L754-L810

  • load new "VectorCube" from geojson with (experimental) load_uploaded_files process
  • use vector cube as geometry in aggregate_spatial
  • export result as geojson again (note preservation of original geojson properties and addition of aggregation values as new properties)

@bossie
Copy link
Collaborator

bossie commented Sep 16, 2022

Test case:

polygon_1 = Polygon([(10.4566, 51.3747), (10.4335, 51.3732), (10.4527, 51.3615), (10.4566, 51.3747)])
polygon_2 = Polygon([(10.4566, 51.3747), (10.4334, 51.3732), (10.4528, 51.3614), (10.4566, 51.3747)])

def as_feature(geometry, id) -> dict:
    return {
        'type': 'Feature',
        'id': id,
        'properties': {},
        'geometry': mapping(geometry)
    }

feature_collection = {
    'type': 'FeatureCollection',
    'properties': {},
    'features': [
        as_feature(polygon_1, id="apples"),
        as_feature(polygon_2, id="oranges")
    ]
}

im = (conn
      .load_collection("SENTINEL2_L2A",
                       bands=["B04", "B03", "B02"],
                       spatial_extent={"west": 10.4005, "south": 51.3371, "east": 10.5152, "north": 51.3856},
                       temporal_extent=["2021-07-08T00:00:00Z", "2021-07-08T00:00:00Z"])
      .aggregate_spatial(feature_collection, "mean"))

im.download("/tmp/test_aggregate_spatial_feature_ids.nc")
bossie@rastapopoulos:~$ ncdump /tmp/test_aggregate_spatial_feature_ids.nc | grep 'feature_names ='
 feature_names = "feature_0", "feature_1" ;

@bossie
Copy link
Collaborator

bossie commented Sep 20, 2022

Example client code:

feature_collection = conn.datacube_from_process("load_uploaded_files",
                                                paths=["/data/users/Public/vdboschj/FeatureCollection.geojson"],
                                                format="GeoJSON")

im = (conn
      .load_collection("SENTINEL2_L2A",
                       bands=["B04", "B03", "B02"],
                       spatial_extent={"west": 10.4005, "south": 51.3371, "east": 10.5152, "north": 51.3856},
                       temporal_extent=["2021-07-08T00:00:00Z", "2021-07-08T00:00:00Z"])
      .aggregate_spatial(feature_collection, "mean"))

im.download("means.nc")

@soxofaan : is there a more elegant way to write this?

@bossie
Copy link
Collaborator

bossie commented Sep 20, 2022

Alternative:

from openeo.processes import load_uploaded_files

feature_collection = load_uploaded_files(paths=["/data/users/Public/vdboschj/FeatureCollection.geojson"],
                                         format="GeoJSON")

# ...

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 23, 2022
@bossie
Copy link
Collaborator

bossie commented Sep 23, 2022

The input GeoJSON file has to be accessible from the OpenEO back-end and its Features should carry an "id", either:

  • as a child of the Feature (like below) or;
  • as part of its "properties": "properties": {"id": "apples"}
{
  "type": "FeatureCollection",
  "properties": {},
  "features": [
    {
      "type": "Feature",
      "id": "apples",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              10.4566,
              51.3747
            ],
            [
              10.4335,
              51.3732
            ],
            [
              10.4527,
              51.3615
            ],
            [
              10.4566,
              51.3747
            ]
          ]
        ]
      }
    },
    {
      "type": "Feature",
      "id": "oranges",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              10.4566,
              51.3747
            ],
            [
              10.4334,
              51.3732
            ],
            [
              10.4528,
              51.3614
            ],
            [
              10.4566,
              51.3747
            ]
          ]
        ]
      }
    }
  ]
}

bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Sep 23, 2022
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 23, 2022
@bossie
Copy link
Collaborator

bossie commented Sep 26, 2022

@lru_cache load_collection is now able to cope with DriverVectorCube being passed as an argument and part of the cache key. Added some tests to make sure that this caching actually works (I don't think it did).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants