Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store test datasets in repo #235

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

norlandrhagen
Copy link
Collaborator

@norlandrhagen norlandrhagen commented Aug 31, 2024

Adds a way to build test netcdf files and store them within the repo.

@norlandrhagen
Copy link
Collaborator Author

This test test_numpy_arrays_to_inlined_kerchunk_refs is failing.

refs["refs"]["lon/0"]
'base64:AABIQwCASkMAAE1DAIBPQwAAUkMAgFRDAABXQwCAWUMAAFxDAIBeQwAAYUMAgGNDAABmQwCAaEMAAGtDAIBtQwAAcEMAgHJDAAB1QwCAd0MAAHpDAIB8QwAAf0MAwIBDAACCQwBAg0MAgIRDAMCFQwAAh0MAQIhDAICJQwDAikMAAIxDAECNQwCAjkMAwI9DAACRQwBAkkMAgJNDAMCUQwAAlkMAQJdDAICYQwDAmUMAAJtDAECcQwCAnUMAwJ5DAACgQwBAoUMAgKJDAMCjQwAApUM='

@TomNicholas
Copy link
Member

Great idea! We could make the file even smaller by down sampling spatially presumably.

The failure shows that the variable is not being inlined when it previously was being inlined. This makes sense - which variables are inclined in the test is set by kerchunk's inline_threshold kwarg. These are then compared against the variables manually specified with loadable_variables. The parameter values (500.0 etc) were just chosen to be bigger/smaller than certain variables in the Xarray test dataset. Now you've changed those variables they will be different sizes, so some will be inclined that were not previously. It's a janky setup but I wasn't sure how to do it more neatly because that's the only way kerchunk allows you to control inlining.

@norlandrhagen
Copy link
Collaborator Author

Thanks! Although, I can't take credit for the idea haha.

Late night head scratcher, I can't seem to find the right inline_threshold to satisfy both the lat and the time assert statements. 😕

# for loadable_vars = ['lat','lon']
# lat comparison fails below 101 inline_threshold
# time comparison fails above 16  inline_threshold

@norlandrhagen
Copy link
Collaborator Author

mypy errors seem to be unrelated: #249

@TomNicholas
Copy link
Member

Looking at this again, I think storing the datasets as file is great for roundtrip tests, but we should also strive to make other tests start from a point that doesn't require reading netCDF (and hence doesn't rely on kerchunk). That could either be in-memory kerchunk references, or on-disk kerchunk references that we then intepret using #251, or maybe it could even be something simpler in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store test datasets in this repo
2 participants