-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zarr v3 dependency #182
base: main
Are you sure you want to change the base?
Add Zarr v3 dependency #182
Conversation
Slopy copy-paste, thank you Tom Co-authored-by: Tom Nicholas <[email protected]>
Adding zarr as a dependency would also allow us to un-vendor this small bit of code https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/vendor/zarr/utils.py |
I ran into an issue with a behavior change in the call to
@TomNicholas maybe it might be less work globally if VirtualiZarr didn't depend on Kerchunk? I know there's #78 and #87 as prototypes, maybe they are prerequisites to making VirtualiZarr depend on ZarrV3. The process of using Kerchunk to generate ChunkManifests can be delegated to some other package that can continue depending on ZarrV2. |
@ghidalgo3 that sounds like a gnarly bug! (It would be nice if python allowed you to have different versions imported for different packages, so that kerchunk could depend on zarr v2 whilst virtualizarr imported zarr v3.)
@ghidalgo3 many things in this library would be less work if we didn't have to depend on kerchunk, and we're working towards that, but it's not going to be done quickly (especially not for other non-HDF filetypes that kerchunk can also read). Making kerchunk work with zarr v3 would be the cleanest solution, but it's difficult to know how long that might take too (I know Joe tried it and found it to be a rabbit hole).
I'm not sure I understand this suggestion. Wouldn't that have exactly the same problem as now? |
Not if you have 2 Python environments:
It's an ugly solution yes, but compared to making Kerchunk work with ZarrV3 maybe it's less work? I'm not in a rush so probably makes sense to make Kerchunk work with V3 and V2, and that's only going to happen by upgrading Kerchunk to Zarr>=3.0.0.
That would be nice! But even before then it would be better if Zarr v2 -> Zarr v3 had easy breaking changes :( This specific one is really subtle, there's no API change to I'll focus my effort on trying to make Kerchunk work with Zarr V3 then. |
Ohhh right now I get it - you literally split your workflow up into two separate steps using different environments each time.
I mean to me that sounds like it's kerchunk's fault for using internal zarr API...
Amazing - let us know how that goes (good or bad) so we can re-evaluate here. |
virtualizarr/zarr.py
Outdated
@@ -12,8 +12,7 @@ | |||
import ujson # type: ignore | |||
import xarray as xr | |||
from pydantic import BaseModel, ConfigDict, field_validator | |||
|
|||
from virtualizarr.vendor.zarr.utils import json_dumps | |||
from zarr.v2.util import json_dumps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid this import if possible. We are likely to remove the v2
namespace before the 3.0
release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert and continue to vendor this bit of code.
This reverts commit 8c2a7a0.
@TomNicholas this work did lead down a pretty deep rabbit hole, probably following in Joe's footsteps. My conclusion is that Specifically here is the problem chain this lead me down:
Does that make sense? If so, I I'll abandon both this PR and #175 |
That makes complete sense, and you've really done a great job here going down that rabbit hole and clearly communicating all the steps required. I don't know that we need to close this PR, but it does sound like it will need to be paused for a while before it could be picked up again. I also agree that in the meantime we can aim for v3 support without a dependency on zarr-python v3 cc @jhamman |
Coming back to this now as a lot of things have changed, so we might be able to make progress.
This is still the case.
This is about to be the case, and there is a branch we can test it on. pydata/xarray#9552
VirtualiZarr now does not depend on Kerchunk! #259 Though if you want to read netCDF files you do have to be able to import kerchunk, the dependency is just optional.
This is now maybe within reach? The writing to disk could actually be done using the kerchunk format, as virtualizarr can write to and read from the kerchunk json/parquet format without ever importing kerchunk (reading is new, thanks to @norlandrhagen in #251). At the very least now we should be able to see if new
Adding a reader that can create virtual references from a standard zarr v3 store is tracked in #262. |
This PR is a pre-cursor to #175 , and will be blocked on at least Kerchunk supporting ZarrV3.
TODOs:
Checklist:
docs/releases.rst
api.rst