Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kerchunk indices embedded in STAC items #32

Open
TomAugspurger opened this issue Oct 18, 2023 · 1 comment · May be fixed by #33
Open

Support Kerchunk indices embedded in STAC items #32

TomAugspurger opened this issue Oct 18, 2023 · 1 comment · May be fixed by #33
Assignees

Comments

@TomAugspurger
Copy link

stac-utils/xstac#38 is prototyping how we might store Kerchunk indices in STAC items. Storing Kerchunk metadata in STAC items removes the need to put that metadata in some sidecar file: https://tomaugspurger.net/posts/stac-updates/#stac-and-kerchunk.

The high-level goal is to store the metadata needed for Kerchunk under the fields added by the datacube extension. This lets us deduplicate a few fields (like the attrs maybe others). I'm not sure if this is worth doing or not, because now you need a function to translate between Kerchunk in STAC and the plain kerchunk references. But I don't think we should be putting JSON strings like .zarray in the STAC objects, so we'll needs something like that anyway I think.

Here's a hacky version of what I have in mind. Using this item collection: https://gist.github.com/TomAugspurger/5b5f40c34212b8302e824e66b477062a.

import pystac
import xstac
import pystac
import kerchunk.combine
import fsspec
import xarray as xr

class STACKerchunkBackend(xr.backends.BackendEntrypoint):
    open_dataset_parameters = ["filename_or_obj", "drop_variables"]

    def open_dataset(self, filename_or_obj, *, drop_variables=None):
        if isinstance(filename_or_obj, (list, pystac.ItemCollection)):
            refs = [xstac.kerchunk.stac_to_kerchunk(item) for item in filename_or_obj]
            refs2 = kerchunk.combine.MultiZarrToZarr(refs, concat_dims="time").translate()
        else:
            refs2 = xstac.kerchunk.stac_to_kerchunk(filename_or_obj)

        return xr.open_dataset(fsspec.filesystem("reference", fo=refs2).get_mapper(), engine="zarr", consolidated=False)

ic = pystac.ItemCollection.from_file("item_collection.json")

ds = xr.open_dataset(list(ic), engine=STACKerchunkBackend, chunks={})
ds
@jsignell
Copy link
Member

Thanks for pointing me to that Tom. I guess I should be watching xstac!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants