Supporting OME-zarr? #16

olimcc · 2024-12-04T04:41:41Z

Understanding what it would take to support ome-zarr is a good way for us to test how domain-specific our thinking/building is. That said, it's certainly not critical to supporting anything to do with it, to have a useful tool.

Discovery

OME zarr is a format built on top of zarr to support microscopic imagery. For core imagery/array storage it leverages the same chunking strategies as other array problems that use zarr. However, it has domain specific approaches to metadata and dataset organization. This makes sense - the problem it is going after is much more refined than "earth sciences" write large. Some areas where ome-zarr and weather-zarr (let's call it) differ:

dimension configuration: weather-zarr, often written by xarray, stores information in attrs._ARRAY_DIMENSIONS or via consolidate metadata. In contrast, ome-zarr stores a omero.json file at different levels of a store hierarchy, or with different conventions inside attrs.
ome-zarr appears to have a stricter structure for imagery, which are typically of a more consistent format than wildly varied earth sciences data. You can see more assumptions being made about x/y/z in the vizualization space. Furthermore the domain model is embedded in the ome-zarr API - with references to Plates, PlateLabels, Wells, etc.
ome data has a practice of storing multiple resolutions of an image as part of a dataset (similar to pyramiding in geo imagery), and has conventions around this.

There exists custom tooling to read/write ome-zarr. There is a desire to have closer alignment between xarray and ome (ome/ngff#48). There's also a reasonably recent (2023) paper building moment towards ome-zarr OME-Zarr: a cloud-optimized bioimaging file format with international community support.

Looking at data

This is a nice example of an ome-zarr dataset being rendered in the browser.

The array in question is available here: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/4495402.zarr/0
And a slice compatible with our tool: 0,0,0,130000:134000,450000:454000

This data is served via an s3-compatible API, here's how you can read it using AWS cli tools:

aws s3 \
  --no-sign-request \
  --endpoint-url https://uk1s3.embassy.ebi.ac.uk \
  ls s3://idr/zarr/v0.1/4495402.zarr/

                           PRE 0/
                           PRE 1/
                           PRE 2/
                           PRE 3/
                           PRE 4/
                           PRE 5/
                           PRE 6/
                           PRE 7/
2021-08-25 08:11:38        153 .zattrs
2021-08-25 08:11:39         17 .zgroup
2021-08-25 08:11:39       2233 omero.json

Note we are looking at the 0th array - there are 8 total. My understanding (informed by this) is that these represent different resolutions, along the lines of:

└─ group
    ├─── s0 {} 
    ├─── s1 {"downsamplingFactors": [2, 2, 2]}
    ├─── s2 {"downsamplingFactors": [4, 4, 4]}
    ...

Tooling

The sample data is being rendered by vizarr. It's clearly not made for general purpose zarr viewing, but it does a great job of viewing this domain specific model. The primary contributor to vizarr wrote zarrita.js, it's likely one motivated the other. If we ever want to get into it, this codebase does a nice job of navigating the ome-zarr model, we could learn from it.

There are many viz tools for the microscopy space.

What should we do?
With the loosening of a few constraints, I was able to get an ome-zarr dataset to render in zarr-viewer. My position is that we should do the minimum possible to support rendering the core array with index based selection, but not bother lean in to the ome data model for now. It would be amazing if an ome-zarr user could render and link to an image in this tool - let them ask for more!

Attempting to render the ome-zarr arrays exposed some general shortcomings in our tool today that would benefit all rendering situations:

Elegantly handle cases where dimension data is not available for an array.
Support non float arrays as input (test dataset is a UIntArray).
Support rendering potentially large arrays (where large is > than regl/GPU render target space). AFAICT It's common to look at entire for ome-zarr images, rather than small slices, it was pretty easy to hit regl errors with a large slice into the test dataset. Without getting too far into implementation, this probably means rendering chunks, rather than the fully materialized array slice.
Support navigating the group hierarchy to get to an array. We cast this aside because of the mess that is hrrrzarr, but for anything without consolidated metadata (like ome-zarr) it's nice to have some way to navigate to your array.

The text was updated successfully, but these errors were encountered:

olimcc mentioned this issue Dec 4, 2024

Scratchpad #8

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting OME-zarr? #16

Supporting OME-zarr? #16

olimcc commented Dec 4, 2024 •

edited

Loading

Supporting OME-zarr? #16

Supporting OME-zarr? #16

Comments

olimcc commented Dec 4, 2024 • edited Loading

olimcc commented Dec 4, 2024 •

edited

Loading