Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added the reinflate api #499

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Anu-Ra-g
Copy link
Contributor

Added the reinflate function to reinflate the index.

zarr_ref_store: dict,
) -> dict:
"""
Given a zarr_store hierarchy, pull out the variables present in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a one-line summary at the start of the docstring, since this is what help() and sphynx use as the title describing the method.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions here are identical to the originals by @emfdavid ?

I am OK to include these now, but I think the next PR should move all of the tree/idx-related functions to another module (_grib_idx ?), and only import the public-facing functions into this module. As it is, the module is now extremely long compared to other formats, and it's not so obvious where a user should start.

@emfdavid
Copy link
Contributor

emfdavid commented Sep 4, 2024

Separate module and only exposing public functions sounds good to me.

@emfdavid
Copy link
Contributor

emfdavid commented Sep 4, 2024

Is there a better name for this function? I am terrible at names...

@martindurant
Copy link
Member

Is there a better name for this function?

You mean "reinflate"? Maybe if you write a 2-sentence description, the name will present itself :)

@Anu-Ra-g
Copy link
Contributor Author

Anu-Ra-g commented Sep 4, 2024

@martindurant what should be the module name? The module should be included in a sub-package like kerchunk.grib.<module> or directly in the kerchunk.<module name>?

@martindurant
Copy link
Member

I suggested _grib_idx above. Making kerchunk.grib into a package sounds like a bad idea.

@emfdavid
Copy link
Contributor

emfdavid commented Sep 4, 2024

Comments on naming (not my strong suit)... the Camus team had suggested "outbasing" as the name for this general approach of ingesting an index of chunks to a database and constructing the dataset/tree on the fly.

The "reinflate" method in particular builds the refspec dictionary for a zarr/xarray tree representing a set of grib data. The inputs are the static metadata for the refspec (everything but the chunks for variables with a time dimension), a set of chunk indexes for specific variables (usually stored in a DB), the aggregation type, and the set of axes that define how to arrange those chunks.

I wanted to be general, supporting all the FMRC slices which are represented in the AggregationType enum. The api allows passing a list of axes such that you can express a set of runtimes, a set of valid times, a set of horizons, or the 'best available' as of a certain time.

I hope there is some simpler expression of these concepts that might use this method as a backend, but I suspect you would have to make assumptions about the backend database/schema to wrap it up nicely.

@martindurant
Copy link
Member

  • This code should be in _grib_idx.py, as discussed
  • There should be tests
  • There needs to be far more documentation on how to use the code (again), else no one will ever touch these functions.

@emfdavid
Copy link
Contributor

emfdavid commented Oct 1, 2024

Thank you for getting the process started. I will plan to make progress this month, but it will be a nights and weekends effort. If that is not progressing by the end of October we can regroup.

@martindurant
Copy link
Member

Sure, I was simply book-keeping :)

@rsignell
Copy link

rsignell commented Oct 28, 2024

@emfdavid I was trying to figure out where the 2024 GSoC work stands, and it looks like this is the blocker? Is that correct?

@martindurant
Copy link
Member

Check out #523 - I think that's intended to finish off the topic.

@rsignell
Copy link

@martindurant ooh, yes indeed, that looks like it! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants