Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ PySTACAPIItemLister to list STAC Items matching STAC API search #111

Merged
merged 4 commits into from
Jun 20, 2023

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jun 20, 2023

An iterable-style DataPipe to list STAC Items matching a STAC API search query! Calls pystac_client.ItemSearch.items() to yield pystac.Item instances.

Preview at https://zen3geo--111.org.readthedocs.build/en/111/api.html#zen3geo.datapipes.PySTACAPIItemLister

Usage:

import pystac_client

from torchdata.datapipes.iter import IterableWrapper
from zen3geo.datapipes import PySTACAPIItemLister

# List STAC Items from a STAC API query
query = dict(
    bbox=[57.2, -20.6, 57.9, -19.9],  # xmin, ymin, xmax, ymax
    datetime=["2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z"],
    collections=["s2_l2a"],
)
dp = IterableWrapper(iterable=[query])
dp_pystac_client = dp.search_for_pystac_item(
    catalog_url="https://explorer.digitalearth.africa/stac/"
)
dp_pystac_item_list = dp_pystac_client.list_pystac_items_by_search()

# Loop or iterate over the DataPipe stream
it = iter(dp_pystac_item_list)
stac_item = next(it)

print(stac_item)
# <Item id=ec16dbf6-9729-5a8f-9d72-5e83a8b9f30d>

print(stac_item.properties)
# {'title': 'S2B_MSIL2A_20230103T062449_N0509_R091_T40KED_20230103T075000',
#  'gsd': 10,
#  'proj:epsg': 32740,
#  'platform': 'sentinel-2b',
#  'view:off_nadir': 0,
#  'instruments': ['msi'],
#  'eo:cloud_cover': 0.02,
#  'odc:file_format': 'GeoTIFF',
#  'odc:region_code': '40KED',
#  'constellation': 'sentinel-2',
#  'sentinel:sequence': '0',
#  'sentinel:utm_zone': 40,
#  'sentinel:product_id': 'S2B_MSIL2A_20230103T062449_N0509_R091_T40KED_20230103T075000',
#  'sentinel:grid_square': 'ED',
#  'sentinel:data_coverage': 28.61,
#  'sentinel:latitude_band': 'K',
#  'created': '2023-01-03T06:24:53Z',
#  'sentinel:valid_cloud_cover': True,
#  'sentinel:boa_offset_applied': True,
#  'sentinel:processing_baseline': '05.09',
#  'proj:shape': [10980, 10980],
#  'proj:transform': [10.0, 0.0, 499980.0, 0.0, -10.0, 7900000.0, 0.0, 0.0, 1.0],
#  'datetime': '2023-01-03T06:24:53Z',
#  'cubedash:region_code': '40KED'}

TODO:

  • Initial implementation with a doctest and unit test
  • Some minor documentation fixes

Notes:

  • Why not just use something like:
    def get_all_items(item_search: pystac_client.ItemSearch) -> pystac.ItemCollection:
        return item_search.items()
    
    dp_pystac_item_list = dp_pystac_client.flatmap(fn=get_all_items)
    ? The issue is that FlatMapper doesn't implement the __len__ function (see https://github.com/pytorch/data/blob/v0.6.1/torchdata/datapipes/iter/transform/callable.py#L163-L164), which would break some downstream DataPipes that rely on having a proper __len__.
  • Originally wanted a single DataPipe that would produce a list of pystac.Item objects from either ItemSearch or ItemCollection. However, pystac_client.ItemSearch uses .items() (a callable) while pystac.ItemCollection uses .items (not callable), and would necessitate some messy if-then/try-except statements. Hence why the list_pystac_items_by_search functional name, because there might be a list_pystac_items_by_collection for the pystac.ItemCollection.items in the future (if needed).

Part of #48. Extends #59.

An iterable-style DataPipe to list STAC Items matching a STAC API search query! Calls pystac_client.ItemSearch.items() to yield pystac.Item instances. Included a doctest and a unit test that produces a list of STAC Items from a STAC API search that can be iterated over. Added a new section in the API docs too.
@weiji14 weiji14 added the feature New feature or request label Jun 20, 2023
@weiji14 weiji14 added this to the 0.6.x milestone Jun 20, 2023
@weiji14 weiji14 self-assigned this Jun 20, 2023
yield from item_search.items()

def __len__(self):
return sum(item_search.matched() for item_search in self.source_datapipe)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered using this try-except statement in case item_search.matched() returns None and raises a TypeError: unsupported operand type(s) for +: 'int' and 'NoneType', but couldn't find a suitable STAC collection at https://radiantearth.github.io/stac-browser/ that doesn't implement .matched(). Leaving this out for now, but may add this back in if someone encounters the TypeError (and files a bug report that can be included in a regression test)!

Suggested change
return sum(item_search.matched() for item_search in self.source_datapipe)
try:
return sum(item_search.matched() for item_search in self.source_datapipe)
except TypeError: # unsupported operand type(s) for +: 'int' and 'NoneType'
return sum(len(item_search.items()) for item_search in self.source_datapipe)

Should be referencing `zen3geo.datapipes.pystac_client.PySTACAPIItemListerIterDataPipe`
@weiji14 weiji14 marked this pull request as ready for review June 20, 2023 05:18
@weiji14 weiji14 merged commit 342e43f into main Jun 20, 2023
@weiji14 weiji14 deleted the pystac_client/item_lister branch June 20, 2023 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant