-
Notifications
You must be signed in to change notification settings - Fork 49
Include serializer from ManifestStore in Icechunk virtual references #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@maxrjones I ran my workflow and still got gibberish: https://nbviewer.org/gist/rsignell/32f563cfbe83c20ac01aba00190e5912 |
Yeah, this fix is only for Icechunk. Kerchunk uses Zarr specification 2, which will require separate fix or adapting our Kerchunk writer to use Zarr format 3. I'd be tempted to go with the second option because converting from Zarr V3 to V2 for Kerchunk adds a lot of surface area for potential bugs, but that would be a separate PR. |
@maxrjones cool, I wanted to use icechunk anyway! And as you said, it does solve my use case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to check my understanding, the reason this failed was because the default array->bytes codec (i.e. the "serializer") in zarr v3 terms was for little endian? Which is fine until you use it to decode big-endian data, when it will give you gibberish?
That's right. We still likely have bugs because a user can define their own codec that doesn't subclass from Zarr python. In that case, their custom codec will be dropped by VirtualiZarr/virtualizarr/codecs.py Lines 49 to 63 in f3149d6
I expect a lot of our codec handling will get much easier with @d-v-b's refactor in Zarr-Python, which is why I have been proposing small patch fixes rather than more fundamental changes. |
Could we at least raise in that scenario? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a changelog entry and this is good to go, thank you!
yeah we definitely should raise in that scenario. |
I think we need much more work around our handling of codecs (which would get much easier with zarr-developers/zarr-python#3276), but this is a more immediate fix to support parsers that specify any serializer other than the default (e.g., big-endian bytes).
This makes your NetCDF3 example work @rsignell
docs/releases.rst
api.rst