Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add load_stac #127

Merged
merged 14 commits into from
Jul 13, 2023
Merged

feat: add load_stac #127

merged 14 commits into from
Jul 13, 2023

Conversation

clausmichele
Copy link
Member

@clausmichele clausmichele commented Jun 27, 2023

Closes #120

First version of load_stac using stackstac to load items generated by queries using pystac-client. Another possibility would be using odc-stac, but it currently has some limitations which I explained here: opendatacube/odc-stac#54 (comment)
Anyway, we could even consider to support both, depending on the user requirements.

Currently supports only STAC Collections provided by catalogs with the /query endpoint for fitlering.

@clausmichele clausmichele marked this pull request as ready for review June 28, 2023 15:55
@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #127 (12d822f) into main (5f794bf) will decrease coverage by 0.93%.
The diff coverage is 47.47%.

@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
- Coverage   76.63%   75.70%   -0.93%     
==========================================
  Files          25       26       +1     
  Lines        1070     1169      +99     
==========================================
+ Hits          820      885      +65     
- Misses        250      284      +34     
Impacted Files Coverage Δ
...ocesses_dask/process_implementations/cubes/load.py 44.68% <44.68%> (ø)
...ses_dask/process_implementations/cubes/__init__.py 100.00% <100.00%> (ø)
...ocesses_dask/process_implementations/exceptions.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@clausmichele
Copy link
Member Author

@soxofaan maybe you could also have a look at this PR, since it can be related to how we handle STAC in the python client too.

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit unknown territory for me, so just some superficial notes

@clausmichele
Copy link
Member Author

Thanks @soxofaan, your feedback is always appreciated. I will fix the addressed points and then wait for @LukeWeidenwalker when he comes back next week.

Copy link
Contributor

@LukeWeidenwalker LukeWeidenwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clausmichele Thanks a lot for this - bunch of comments!

On the question of stackstac vs odc-stac:

  • I'm generally rather indifferent to which library we end up using, but would be good to be deliberate about the choice!
  • You mention metadata not being parsed in clarification on difference between this library and stackstac? opendatacube/odc-stac#54 (comment) - just for my understanding, which specific metadata are you referring to here?
  • In anticipation of UC8, we should consider how we'll limit this query to certain geometries. odc-stac can do this with the geopolygon parameter here, are you aware of any equivalent functionality in stackstac?

I'll do some more testing with EODC collections specifically today, so there might be more feedback coming up!

tests/test_load_stac.py Outdated Show resolved Hide resolved
@clausmichele
Copy link
Member Author

@clausmichele Thanks a lot for this - bunch of comments!

On the question of stackstac vs odc-stac:

* I'm generally rather indifferent to which library we end up using, but would be good to be deliberate about the choice!

* You mention metadata not being parsed in [clarification on difference between this library and stackstac? opendatacube/odc-stac#54 (comment)](https://github.com/opendatacube/odc-stac/issues/54#issuecomment-1602704281) - just for my understanding, which specific metadata are you referring to here?

You can test the difference using this sample script:

import odc.stac
import stackstac
import pystac_client
import planetary_computer as pc
import stackstac

URL = "https://planetarycomputer.microsoft.com/api/stac/v1"
catalog = pystac_client.Client.open(URL,modifier=pc.sign_inplace)
spatial_extent = {"west": 11.259613, "east": 11.406212, "south": 46.461019, "north": 46.522237}
bbox = [spatial_extent["west"],spatial_extent["south"],spatial_extent["east"],spatial_extent["north"]]
items = catalog.search(
    bbox=bbox,
    collections=["landsat-8-c2-l2"],
    datetime=["2021-01-01T00:00:00.000Z", "2023-12-01T00:00:00.000Z"]
).get_all_items()
print(len(items))

print("+"*80)
print("OUTPUT OF ODC-STAC")
print("+"*80)
odc_data = odc.stac.load(
    items,chunks={})
print(odc_data.SR_B1)

print("+"*80)
print("OUTPUT OF STACKSTAC")
print("+"*80)

stackstac_data = stackstac.stack(items)
print(stackstac_data.loc[dict(band='SR_B1')])

Output:

79
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
OUTPUT OF ODC-STAC
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<xarray.DataArray 'SR_B1' (time: 79, y: 13132, x: 11832)>
dask.array<SR_B1, shape=(79, 13132, 11832), dtype=float32, chunksize=(1, 13132, 11832), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 5.374e+06 5.374e+06 5.374e+06 ... 4.98e+06 4.98e+06
  * x            (x) float64 4.98e+05 4.98e+05 4.98e+05 ... 8.529e+05 8.529e+05
    spatial_ref  int32 32632
  * time         (time) datetime64[ns] 2021-01-04T10:04:24.413826 ... 2022-03...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
OUTPUT OF STACKSTAC
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
C:\Users\mclaus\Documents\GitHub\stackstac\stackstac\prepare.py:369: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now t
he default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(
<xarray.DataArray 'stackstac-4a7f4a6696aa69b99ddd17edd9f4fb28' (time: 79,
                                                                y: 13132,
                                                                x: 11832)>
dask.array<getitem, shape=(79, 13132, 11832), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/27)
  * time                         (time) datetime64[ns] 2021-01-04T10:04:24.41...
    id                           (time) <U31 'LC08_L2SP_193027_20210104_02_T1...
    band                         <U13 'SR_B1'
  * x                            (x) float64 4.98e+05 4.98e+05 ... 8.529e+05
  * y                            (y) float64 5.374e+06 5.374e+06 ... 4.98e+06
    instruments                  object {'tirs', 'oli'}
    ...                           ...
    title                        <U46 'Coastal/Aerosol Band (B1)'
    gsd                          float64 30.0
    common_name                  object 'coastal'
    center_wavelength            object 0.44
    full_width_half_max          object 0.02
    epsg                         int32 32632
Attributes:
    spec:        RasterSpec(epsg=32632, bounds=(497970.0, 4980270.0, 852930.0...
    crs:         epsg:32632
    transform:   | 30.00, 0.00, 497970.00|\n| 0.00,-30.00, 5374230.00|\n| 0.0...
    resolution:  30.0

Anyway, we can also easily support both libraries! But at the moment I need to release this version based on stackstac and open a new PR later on showing how to use odc-stac as well.

* In anticipation of UC8, we should consider how we'll limit this query to certain geometries. `odc-stac` can do this with the `geopolygon` parameter [here](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc-stac-load), are you aware of any equivalent functionality in stackstac?

I'll do some more testing with EODC collections specifically today, so there might be more feedback coming up!

From what I see in the odc-stac docs, it is just using the bbox of the polygon, so nothing particularly difficult to implement for stackstac as well.

@LukeWeidenwalker
Copy link
Contributor

Ah, gotcha, thanks for the example!
Changes look good to me know, but I just realized that we're still missing the process spec, would be great if that can be added, then I'm happy to merge this!

@clausmichele
Copy link
Member Author

@LukeWeidenwalker PR for adding the spec is here: eodcgmbh/openeo-processes#9

@LukeWeidenwalker LukeWeidenwalker merged commit ffd06ad into Open-EO:main Jul 13, 2023
2 of 4 checks passed
@clausmichele clausmichele deleted the load_stac branch November 2, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement load_stac
3 participants