-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guidance on how to best use STAC and Zarr together to the Cloud-Optimized Geospatial Formats Guide #134
Comments
Ok here is what I have thought through so far: STAC (SpatioTemporal Asset Catalogs) is a specification for defining and searching any type of data that has spatial and temporal dimensions. STAC has seen significant adoption in the earth observation community. Zarr is a specification for storing groups of cloud-optimized arrays. Zarr has been adopted by the earth modeling community (led by Pangeo). Both STAC and Zarr offer a flexible nested structure with arbitrary metadata at a variety of levels -- for STAC: catalog, collection, item, asset, for Zarry: group, array, (and maybe in the future chunk with icechunk/kerchunk??). This flexibility has contributed to their popularity, but can also makes it hard to tell what they are designed to be particularly good at. Comparison table
Note: This mostly has to do with the STAC spec and the STAC API spec, but STAC also has the extensions which include an onthology of domain-specific metadata as well as a mechanism for adding new extensions. This is similar to CF conventions on STAC but Some additional thoughts:
InterfaceFor people familiar with xarray they are likely to want to get to xarray as quickly as possible and don't mind doing more filtering once they are there. For people more familiar with STAC they are likely to want to use STAC tooling to do searching and so it might make sense to push STAC filters down into zarr (maybe using a tool like xpublish?). So there should be good tooling to go back and forth between STAC and zarr. Here is what exists so far: Accessing Zarr with STAC-like patternsThink of this as an nd data cube of regularly gridded model output For data producers:
For data consumers:
Acessing STAC with xarray patternsThink of this as a bunch of COGs at the same place over a period of time. For data producers:
For data consumers:
Storing results:This is a newer area of development. The core idea is that instead of repeatedly querying you could store the results for easy access.
Based off of the following discussions:
Other Comments:
Sean Harkins
Martin Durant |
Duplicate of NASA-IMPACT/veda-odd#81 at @wildintellect's request
@gadomski's talk at the Pangeo showcase prompted a great question and subsequent discussion about how best to use Zarr with STAC. Questions about using Zarr with STAC have come up many times. Therefore, I think that guidance on this topic would really help the migration to cloud-native file formats. I think it would be great to use notes from today's discussion as the starting point for a PR to the could optimized geospatial formats guide. The PR would add a framework for deciding when/how to use Zarr with STAC. We could share the PR with the broader STAC, Pangeo, and CNG communities for comments, feedback, and additions.
The text was updated successfully, but these errors were encountered: