Include serializer from ManifestStore in Icechunk virtual references #766

maxrjones · 2025-08-09T22:38:42Z

I think we need much more work around our handling of codecs (which would get much easier with zarr-developers/zarr-python#3276), but this is a more immediate fix to support parsers that specify any serializer other than the default (e.g., big-endian bytes).

This makes your NetCDF3 example work @rsignell

Closes ArrayBytesCodec not included in Icechunk serialization #762
Tests added
Tests passing
Full type hint coverage
Changes are documented in docs/releases.rst
New functions/methods are listed in api.rst
New functionality has documentation

rsignell · 2025-08-10T19:28:25Z

@maxrjones I ran my workflow and still got gibberish: https://nbviewer.org/gist/rsignell/32f563cfbe83c20ac01aba00190e5912
Did I need to update some other stuff in addition to virtualizarr?
Or perhaps more likely I did something else wrong. 😕

maxrjones · 2025-08-11T19:15:13Z

@maxrjones I ran my workflow and still got gibberish: nbviewer.org/gist/rsignell/32f563cfbe83c20ac01aba00190e5912 Did I need to update some other stuff in addition to virtualizarr? Or perhaps more likely I did something else wrong. 😕

Yeah, this fix is only for Icechunk. Kerchunk uses Zarr specification 2, which will require separate fix or adapting our Kerchunk writer to use Zarr format 3. I'd be tempted to go with the second option because converting from Zarr V3 to V2 for Kerchunk adds a lot of surface area for potential bugs, but that would be a separate PR.

rsignell · 2025-08-11T22:31:04Z

@maxrjones cool, I wanted to use icechunk anyway! And as you said, it does solve my use case:
https://nbviewer.org/gist/rsignell/480cf6ac5142ce2b828199bcc601a93e

TomNicholas

So to check my understanding, the reason this failed was because the default array->bytes codec (i.e. the "serializer") in zarr v3 terms was for little endian? Which is fine until you use it to decode big-endian data, when it will give you gibberish?

virtualizarr/manifests/store.py

maxrjones · 2025-08-12T20:51:50Z

So to check my understanding, the reason this failed was because the default array->bytes codec (i.e. the "serializer") in zarr v3 terms was for little endian? Which is fine until you use it to decode big-endian data, when it will give you gibberish?

That's right.

We still likely have bugs because a user can define their own codec that doesn't subclass from Zarr python. In that case, their custom codec will be dropped by

VirtualiZarr/virtualizarr/codecs.py

Lines 49 to 63 in f3149d6

    
           def extract_codecs( 
        
               codecs: CodecPipeline, 
        
           ) -> DeconstructedCodecPipeline: 
        
               """Extracts various codec types.""" 
        
               arrayarray_codecs: tuple[ArrayArrayCodec, ...] = () 
        
               arraybytes_codec: ArrayBytesCodec | None = None 
        
               bytesbytes_codecs: tuple[BytesBytesCodec, ...] = () 
        
               for codec in codecs: 
        
                   if isinstance(codec, ArrayArrayCodec): 
        
                       arrayarray_codecs += (codec,) 
        
                   if isinstance(codec, ArrayBytesCodec): 
        
                       arraybytes_codec = codec 
        
                   if isinstance(codec, BytesBytesCodec): 
        
                       bytesbytes_codecs += (codec,) 
        
               return (arrayarray_codecs, arraybytes_codec, bytesbytes_codecs)

if they use create_v3_array_metadata. This may be the cause of #770.

I expect a lot of our codec handling will get much easier with @d-v-b's refactor in Zarr-Python, which is why I have been proposing small patch fixes rather than more fundamental changes.

TomNicholas · 2025-08-12T20:54:37Z

their custom codec will be dropped by

Could we at least raise in that scenario?

TomNicholas

Add a changelog entry and this is good to go, thank you!

maxrjones · 2025-08-12T21:09:57Z

their custom codec will be dropped by

Could we at least raise in that scenario?

yeah we definitely should raise in that scenario.

Include serializer from ManifestStore in Icechunk virtual references

7bf60c0

maxrjones temporarily deployed to test-release August 9, 2025 22:39 — with GitHub Actions Inactive

Add a test

2433372

maxrjones temporarily deployed to test-release August 9, 2025 22:49 — with GitHub Actions Inactive

maxrjones mentioned this pull request Aug 11, 2025

VirtualiZarr bug fixes NASA-IMPACT/veda-odd#222

Closed

maxrjones marked this pull request as ready for review August 11, 2025 22:20

maxrjones mentioned this pull request Aug 11, 2025

Extract endianness from Bytes codec in V2 metadata conversion #769

Merged

7 tasks

maxrjones requested a review from a team August 11, 2025 22:26

TomNicholas reviewed Aug 12, 2025

View reviewed changes

virtualizarr/manifests/store.py Outdated Show resolved Hide resolved

Remove unrelated fix

e3b7c29

maxrjones temporarily deployed to test-release August 12, 2025 20:40 — with GitHub Actions Inactive

TomNicholas approved these changes Aug 12, 2025

View reviewed changes

TomNicholas mentioned this pull request Aug 12, 2025

Raise instead of silently dropping custom codec #773

Closed

TomNicholas added the Icechunk 🧊 Relates to Icechunk library / spec label Aug 13, 2025

release note

a19c883

TomNicholas temporarily deployed to test-release August 13, 2025 20:41 — with GitHub Actions Inactive

remove trailing )

38873e8

TomNicholas temporarily deployed to test-release August 13, 2025 20:42 — with GitHub Actions Inactive

TomNicholas merged commit a8e0ac7 into zarr-developers:main Aug 13, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include serializer from ManifestStore in Icechunk virtual references #766

Include serializer from ManifestStore in Icechunk virtual references #766

Uh oh!

maxrjones commented Aug 9, 2025 •

edited

Loading

Uh oh!

rsignell commented Aug 10, 2025 •

edited

Loading

Uh oh!

maxrjones commented Aug 11, 2025

Uh oh!

rsignell commented Aug 11, 2025

Uh oh!

TomNicholas left a comment

Uh oh!

Uh oh!

maxrjones commented Aug 12, 2025

Uh oh!

TomNicholas commented Aug 12, 2025

Uh oh!

TomNicholas left a comment

Uh oh!

maxrjones commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Include serializer from ManifestStore in Icechunk virtual references #766

Include serializer from ManifestStore in Icechunk virtual references #766

Uh oh!

Conversation

maxrjones commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rsignell commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxrjones commented Aug 11, 2025

Uh oh!

rsignell commented Aug 11, 2025

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxrjones commented Aug 12, 2025

Uh oh!

TomNicholas commented Aug 12, 2025

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

maxrjones commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxrjones commented Aug 9, 2025 •

edited

Loading

rsignell commented Aug 10, 2025 •

edited

Loading