Draft: jigsaw codecs: allow codecs to specify the buffers they work on #3529

keewis · 2025-10-16T22:46:24Z

For my work on the sparse codec (and after discussing with @d-v-b and @jhamman at the zarr summit), I've noticed that it should be possible to have the codecs declare their input and output buffer types. The codec pipeline can then verify that the codecs form a chain of buffer types (kind of like jigsaw puzzle pieces), and infer the codec pipeline's buffer prototype as the input of the first array-to-array codec and the output of the last bytes-to-bytes codec.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

This reverts commit d57b383.

keewis · 2025-10-16T22:53:24Z

it just dawned to me that we can potentially split up the sparse codec (which is a array-to-bytes codec) into a array-to-array codec that extracts the metadata and component arrays of the sparse array and creates to specialized "multi-array buffer" for sparse arrays, and a generalized array-to-bytes codec that takes the "multi-array buffer" and packs it into bytes. This obviously means that the metadata we extracted has to live in the array-to-array codec's configuration.

Then should we want a similar procedure for a different array type (e.g. masked arrays or geoarrow-encoded geometry arrays), we can just create a specialized pair of array-to-array codec and "multi-array buffer" type, and reuse the "multi-array to bytes" codec.

keewis added 6 commits October 15, 2025 12:26

array registry infrastructure

d57b383

infer the prototype from the array type

b35030e

add buffer declarations to codecs

b36c1a1

use the codec pipeline's prototype instead

1f718a8

Revert "array registry infrastructure"

ca12282

This reverts commit d57b383.

get typing to pass

7ba5ced

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Oct 16, 2025

keewis mentioned this pull request Oct 16, 2025

configure zarr to use the sparse buffer for the sparse codec keewis/zarr-sparse#15

Open

keewis added 3 commits October 17, 2025 11:10

also make the v2 codec declare its buffers

d53ca8d

more input / output declarations on codecs

78287f9

more buffer declarations

b9b7058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Draft: jigsaw codecs: allow codecs to specify the buffers they work on #3529

Draft: jigsaw codecs: allow codecs to specify the buffers they work on #3529

Uh oh!

keewis commented Oct 16, 2025

Uh oh!

keewis commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Draft: jigsaw codecs: allow codecs to specify the buffers they work on #3529

Are you sure you want to change the base?

Draft: jigsaw codecs: allow codecs to specify the buffers they work on #3529

Uh oh!

Conversation

keewis commented Oct 16, 2025

Uh oh!

keewis commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant