-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRD-07 data retrieval from openEO: raw bands, based on samples #63
Comments
I've been playing around with this and trying to make it work in openEO via the VITO backend. My starting point was a geodataframe of 10 parcels (to start simple) which are relatively close together: This is the code used to get to the final datacube: # set extent from the geodataframe
minx, miny, maxx, maxy = gdf.unary_union.envelope.bounds
spat_ext = dict(west=minx, east=maxx, north=maxy, south=miny, crs=gdf.crs.to_epsg())
temp_ext = ["2019-01-01", "2019-12-31"]
# link to collection, specify bands
s2 = connection.load_collection(
'SENTINEL2_L2A_SENTINELHUB',
spatial_extent=spat_ext,
temporal_extent=temp_ext,
bands=["B04","B08","SCL"]
)
# filter to specific geometries
s2 = s2.filter_spatial(geometries = json.loads(gdf.geometry.to_json()))
# SCL cleanup/filtering
s2 = s2.process("mask_scl_dilation", data=s2, scl_band_name="SCL")
# calculate NDVI
s2 = s2.ndvi(red="B04", nir="B08", target_band='NDVI')
# download the data and load it
job = s2.execute_batch("cube.nc", out_format="netCDF")
ds = xarray.load_dataset('./cube.nc') Observations:
The steps from this point (having a .nc dataset on disk) start to differ now w.r.t. to the approach we would be taking:
Pixel-based approachThe goal here would be to obtain a pandas dataframe, where each row represents a timeseries for a single pixel, and where only the pixels from the area of interest are taken, but one should still be able to subset the dataset to a specific geometry - spatial context should be available. I'm not sure what the proper way of doing this is, but my initial approach took me down the Another way would be to rasterize the geometry mask into the xarray and then convert it, but I guess you would need to add a timeless feature to the xarray, haven't tried that yet. Parcel-based approachThis one is a bit more manageable, because you can just add the spatial aggregation function to the datacube pipeline. However, this requires re-downloading the data, where the spatial aggregation is done on the fly. It would make much more sense to just run the aggregation on the pixel-level dataframe if it is already available. If the pixel-level dataset is not available, then this is fine and can be achieved via: from openeo.rest.conversions import timeseries_json_to_pandas
aggregated_data_json = s2.aggregate_spatial(geometries=json.loads(gdf.geometry.to_json()), reducer='mean').execute()
timeseries_json_to_pandas(aggregated_data_json) This returns an output in the following format: Which means that we would need to reshape the dataframe a bit to get the row-based format which is easier to work with. thoughts/comments/suggestions? |
The key thing here is the 'sample_by_feature' setting: This will give you one netcdf per parcel, containing all pixels, masked to the geometry. So from there you can indeed compute aggregates easily. Features are sent in geojson, the official spec only supports WGS84. |
Excellent! thanks for the tips. I tried the
|
One more question - what would be the most sensible way to burn a raster of parcel IDs from the geometries into the xarray dataset? My initial thoughts were:
Or do you suggest some other way? |
The error occurs because this call indeed results in multiple files. Try removing the file name, and rather assign the result of execute_batch to a 'job' variable. You can then retrieve results from this variable and invoke download_files. Or you can use the openeo web editor to even inspect the results of that failed call. We're actually working on storing the id of input features also in output: |
Thanks, this worked.
That sounds perfect. If I have the id of the input feature, then I can also merge this info into the dataframe in the end
I guess it can be as an attribute, this just means that I'll have to add it to the dataframe manually, but since this seems like a specific thing for my usecase anyway, it's fine by me if it's the way you describe it. |
After downloading a Which then allowed me to group by the sampling feature and plot all pixels which correspond to it, and also to calculate the spatial average for each timestamp and plot also that. Seems like a lot of clouds / invalid data gets through still. @jdries I noticed the More info at https://docs.sentinel-hub.com/api/latest/user-guides/cloud-masks/#cloud-masks-and-cloud-probabilities |
Inputs:
The text was updated successfully, but these errors were encountered: