Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookbooks in need of maintenance #200

Open
brian-rose opened this issue May 21, 2024 · 10 comments
Open

Cookbooks in need of maintenance #200

brian-rose opened this issue May 21, 2024 · 10 comments

Comments

@brian-rose
Copy link
Member

brian-rose commented May 21, 2024

As part of our planning for the 2024 Pythia Cook-off, this issue will serve to track progress in identifying and maybe fixing issues that are causing some currently published Cookbooks to fail their nightly builds.

Top priority here is to identify causes of failure that are likely to block progress during the hackathon -- either because of plans to further develop these particular cookbooks, or because the failures are due to systemic issues that we expect will affect some new cookbooks.

@brian-rose
Copy link
Member Author

brian-rose commented May 21, 2024

Currently failing cookbooks are listed here in checklist format. Check them off if / when the nightly build is passing again. For each cookbook, it's best to open specific issues in their own repos but link them here.

@brian-rose
Copy link
Member Author

Xbatcher failures related to upstream changes to Intake discussed over at ProjectPythia/xbatcher-ML-1-cookbook#12.

Failure of Intake Cookbook discussed at ProjectPythia/intake-cookbook#33. Also related to same upstream change to Intake.

@brian-rose
Copy link
Member Author

Data access issue with the NASA GISSTEMP dataset affecting the CESM LENS cookbook, ProjectPythia/cesm-lens-aws-cookbook#27

@brian-rose
Copy link
Member Author

Kerchunk is failing with a data access issue, ProjectPythia/kerchunk-cookbook#61

@erogluorhan
Copy link
Member

Only Xbatcher left from this list

@ktyle
Copy link
Contributor

ktyle commented Dec 3, 2024

The xbatcher cookbook build issue may be a tough one to resolve in its current form. It reads an Intake catalog that requires requester-pays access.

@clyne
Copy link
Contributor

clyne commented Dec 3, 2024

Are the data sets small enough (order a few Terabytes, not Petabytes) that we can copy them to jetstream or GDEX?

@ktyle
Copy link
Contributor

ktyle commented Dec 9, 2024

There are 11 float32 variables with dimensions nlat: 2400 nlon: 3600 time: 14965, and 59 coordinate variables of various dimensions/types. So I think we're talking about 10-100 TB ...

@brian-rose
Copy link
Member Author

Where is the data hosted? Is it possible in principle for us to set up a BinderHub service in the same cloud region and sidestep the request-pay issue? We do have some commercial cloud credit resources that we're not currently using.

@ktyle
Copy link
Contributor

ktyle commented Dec 10, 2024

It's on Google Cloud. I think in principle one could set up a BinderHub on GCP, though I imagine that would take some time to set up and then administer. I think a quicker and better solution would be to re-work the xbatcher cookbook so that it uses a dataset that doesn't have this requester-pays burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants