Feature/ome zarr writer v2 #48

toloudis · 2024-07-03T00:42:11Z

Link to Relevant Issue

Resolves #46

We have been developing a new ome-zarr-writer that works better for our long time series data. We want to bring that code into this repo and make it a part of bioio.

Description of Changes

Add new ome zarr writer. Do some code cleanup, and make sure it is well commented and documented.
Add unit tests for it.

This PR is intended to be like an "initial add", bringing the new code in without any functional changes, apart from code formatting and commenting, so that it can be a direct replacement in our ome-zarr-conversion codebase.

.pre-commit-config.yaml

evamaxfield · 2024-07-05T16:56:55Z

bioio/writers/__init__.py

+from .ome_zarr_writer import OmeZarrWriter as OmeZarrWriter_old
+from .ome_zarr_writer_2 import OmeZarrWriter

 __all__ = [
    "OmeTiffWriter",
    "OmeZarrWriter",
+    "OmeZarrWriter_old",
 ]


Is our old reader better for large Z stack data while this one is better for timeseries or is this one just better overall?

Good question. (Note this is still a draft PR and I am not expecting that this _old suffix is really best. It's less of a breakage to call the new one OmeZarrWriter2 and leave the old one with same name.)

Here are some of the key differences:

The new one is built on tooling for computing multiscales that is more stable than what the old one uses

The new one works better for bigger data because it does things in smaller steps. There is no one one-size-fits-all "write this data array to this url" function in the new writer -- at least not yet.

The new code is coming from another repo where it's been used to do some bulk data conversions internally already.

The new one is currently providing functions that expect to deal with time series of zstacks where the zstacks still might reasonably fit in memory.

The new one is currently assuming 5D data coming from bioio.BioImage, or sequences of images (one per T), or (in progress) a 5D ArrayLike.

toloudis · 2024-07-08T21:12:47Z

pyproject.toml

  "numpy>=1.21.0,<2.0.0",
  "ome-types[lxml]>=0.4.0",
  "ome-zarr>=0.6.1",
  "semver>=3.0.1",
  "tifffile>=2021.8.30",
-  "zarr>=2.6.0,<3.0.0",
+  # this pin <2.18.0 should be temporary and is maybe related to https://github.com/zarr-developers/zarr-python/issues/1891
+  "zarr>=2.6.0,<2.18.0",


sadly, without this version pin, unit tests were failing. I don't know what the fix is but we should probably have a "reminder" issue to try to unpin it again later.

toloudis · 2024-07-08T21:13:43Z

bioio/writers/ome_zarr_writer_2.py

+                channel_colors=my_channel_colors,
+            )
+            writer.write_metadata(meta)
+    """


the above example is the typical usage/code flow of how to use this writer

Does this also resolve the desire to be able to append new data to the zarr store over time? I.e. while a microscope is taking a timelapse, can we store each timepoint as they come in?

If its not possible, no worries, if it is possible, I would say that is worth highlighting

This code isn't made for modifying an existing one, I believe it's only been tested for when you are ready to write everything all at once... the use case of converting pre-existing files in other formats. But it might work in the scenario you describe! Part of the workflow does involve knowing the total shape ahead of time...

BrianWhitneyAI · 2024-07-10T17:02:51Z

.github/workflows/ci.yml

  # Check tests pass on multiple Python and OS combinations
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
-        python-version: [3.9, "3.10", "3.11"]
-        os: [ubuntu-latest, macOS-latest, windows-latest]
+        python-version: [3.9, "3.10", "3.11", "3.12"]


nit. I believe the quotations are only necessary for versions with a trailing zero?

I can remove em - I was following the pre-existing pattern which already had 3.11 in quotes.

BrianWhitneyAI · 2024-07-10T17:10:54Z

bioio/writers/ome_zarr_writer_2.py

+    # Rechunk the input blocks so that the factors achieve an output
+    # blocks size of full numbers.
+    better_chunksize = tuple(
+        np.maximum(1, np.round(np.array(image.chunksize) * factors) / factors).astype(


nit. Different methods of finding maximum throughout file

I'm not going to modify this one in this PR as it is very specific for the downsampling code and is borrowed from a trusted implementation

BrianWhitneyAI · 2024-07-10T17:13:20Z

bioio/writers/ome_zarr_writer_2.py

+    omero = {
+        "id": 1,  # ID in OMERO
+        "name": image_name,  # Name as shown in the UI
+        "version": "0.4",  # Current version


What is this versioning from?

We may want to make this a constant at the top of the file for easier checking of what omero version our writer uses via import rather than loading of file metadata

This is intended to be the ome-ngff spec version number. I agree I can hoist it up to a symbolic constant but it is also potentially intimately connected to the names in this exact dict.

BrianWhitneyAI · 2024-07-10T17:17:27Z

bioio/writers/ome_zarr_writer_2.py

+    # let's start by just mandating that chunks have to be no more than
+    # 1 T and 1 C
+    chunk_size = (1, 1, shape[2], shape[3], shape[4])
+    while prod(chunk_size) * itemsize > memory_target:


I think this could be simplified to

while prod(chunk_size) * itemsize > memory_target: chunk_size = tuple(max(cs // 2, 1) for cs in chunk_size)

nope, mypy hates that due to the enforcement of 5D tuples here.

BrianWhitneyAI · 2024-07-10T17:20:08Z

bioio/writers/ome_zarr_writer_2.py

+
+
+def _pop_metadata_optionals(metadata_dict: dict) -> dict:
+    for ax in metadata_dict["axes"]:


maybe
metadata_dict["axes"] = [ax for ax in metadata_dict["axes"] if ax.get("unit") is not None]

if I did that it would change the behavior of this code. your suggested change is to only include every axis that has a unit. the point of this code is to go through each "axis" and remove the "unit" key if it happened to have a None value. This function is called "pop metadata optionals" - it's just trying to delete keys that aren't holding meaningful data and the spec says are optional.

BrianWhitneyAI · 2024-07-10T17:23:22Z

bioio/writers/ome_zarr_writer_2.py

+    List of shapes of all nlevels.
+    """
+    shapes = [lvl0shape]
+    for i in range(nlevels - 1):


Maybe

for _ in range(nlevels - 1): shapes.append(tuple(max(int(s / sc), 1) for s, sc in zip(shapes[-1], scaling)))

BrianWhitneyAI

This is awesome! Nice work!

evamaxfield

Generally all looks good to me! Nice work. My biggest nitpick is to store this in a different location rather than calling it "ome_zarr_writer_2" maybe bioio.experimental.writers.ome_zarr_writer?

evamaxfield · 2024-07-10T19:55:41Z

bioio/writers/ome_zarr_writer_2.py

+def resize(
+    image: da.Array, output_shape: Tuple[int, ...], *args: Any, **kwargs: Any
+) -> da.Array:
+    r"""


Do we need the r""" at the front here? That usually signifies a regex string

I can remove it

evamaxfield · 2024-07-10T19:59:47Z

bioio/writers/ome_zarr_writer_2.py

+    omero = {
+        "id": 1,  # ID in OMERO
+        "name": image_name,  # Name as shown in the UI
+        "version": "0.4",  # Current version


We may want to make this a constant at the top of the file for easier checking of what omero version our writer uses via import rather than loading of file metadata

evamaxfield · 2024-07-10T20:01:53Z

bioio/writers/ome_zarr_writer_2.py

+                channel_colors=my_channel_colors,
+            )
+            writer.write_metadata(meta)
+    """


Does this also resolve the desire to be able to append new data to the zarr store over time? I.e. while a microscope is taking a timelapse, can we store each timepoint as they come in?

If its not possible, no worries, if it is possible, I would say that is worth highlighting

toloudis · 2024-07-15T19:21:52Z

Generally all looks good to me! Nice work. My biggest nitpick is to store this in a different location rather than calling it "ome_zarr_writer_2" maybe bioio.experimental.writers.ome_zarr_writer?

Actually I would suggest that this writer code is LESS experimental than the other, since it's been used (from a different repo) to do quite a lot of bulk conversions already.

…-devs/bioio into feature/ome-zarr-writer_v2

toloudis added 11 commits July 2, 2024 11:39

initial add of new writer code from cellbrowser-tools

6b2de0f

auto format

d1c851f

deprecate the old zarr writer for a sec

ccedc9b

start some unit tests

cd9df8e

isort fixes

36445ab

fixing mypy/linting

a031568

truncate comment lines absurdly at 88 characters for flake8

c2b5d88

fix codec type

d34f347

bump flake8 to 6.1.0 to fix linting in Python 3.12

2b30ba2

wip test

413eeb1

persist not compute for now

e60cb4e

evamaxfield reviewed Jul 5, 2024

View reviewed changes

toloudis added 2 commits July 5, 2024 10:10

smaller data for test

991cfac

increase matrix

f9b847a

evamaxfield mentioned this pull request Jul 5, 2024

Admin: update pre-commit to ruff #50

Open

toloudis and others added 11 commits July 5, 2024 11:09

use array_constructor

66407d1

try converting later

83c9099

Merge branch 'main' into feature/ome-zarr-writer_v2

da04e45

pin zarr lower to test

85a9051

revert bad macos change

71df12a

add comment

ec3176e

cleanup

324f15e

just a few more tests

a0843c8

comment fixups

e0df3e5

Merge branch 'main' into feature/ome-zarr-writer_v2

cedbfce

fix naming

6efb070

toloudis marked this pull request as ready for review July 8, 2024 21:11

toloudis requested a review from a team as a code owner July 8, 2024 21:11

toloudis requested review from hughes036 and evamaxfield July 8, 2024 21:11

toloudis requested review from SeanLeRoy, yrkim98, tyler-foster and BrianWhitneyAI July 8, 2024 21:11

toloudis commented Jul 8, 2024

View reviewed changes

BrianWhitneyAI reviewed Jul 10, 2024

View reviewed changes

BrianWhitneyAI approved these changes Jul 10, 2024

View reviewed changes

evamaxfield approved these changes Jul 10, 2024

View reviewed changes

toloudis added 3 commits July 15, 2024 12:25

Merge branch 'feature/ome-zarr-writer_v2' of https://github.com/bioio…

e5aeb56

…-devs/bioio into feature/ome-zarr-writer_v2

a couple of PR cleanup things

4cdb6b4

one more PR cleanup

8c7aa6f

toloudis mentioned this pull request Jul 17, 2024

remove zarr<2.18.0 constraint #53

Open

Merge branch 'main' into feature/ome-zarr-writer_v2

9ed8f4d

toloudis merged commit d1d98c5 into main Jul 17, 2024
20 checks passed

toloudis deleted the feature/ome-zarr-writer_v2 branch July 17, 2024 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ome zarr writer v2 #48

Feature/ome zarr writer v2 #48

toloudis commented Jul 3, 2024 •

edited

Loading

evamaxfield Jul 5, 2024

toloudis Jul 5, 2024

toloudis Jul 8, 2024

toloudis Jul 8, 2024

evamaxfield Jul 10, 2024

toloudis Jul 15, 2024 •

edited

Loading

BrianWhitneyAI Jul 10, 2024

toloudis Jul 15, 2024

BrianWhitneyAI Jul 10, 2024

toloudis Jul 15, 2024 •

edited

Loading

BrianWhitneyAI Jul 10, 2024

evamaxfield Jul 10, 2024

toloudis Jul 15, 2024

BrianWhitneyAI Jul 10, 2024

toloudis Jul 15, 2024

BrianWhitneyAI Jul 10, 2024

toloudis Jul 15, 2024

BrianWhitneyAI Jul 10, 2024

BrianWhitneyAI left a comment

evamaxfield left a comment

evamaxfield Jul 10, 2024

toloudis Jul 15, 2024

evamaxfield Jul 10, 2024

evamaxfield Jul 10, 2024

toloudis commented Jul 15, 2024



		def _pop_metadata_optionals(metadata_dict: dict) -> dict:
		for ax in metadata_dict["axes"]:

Feature/ome zarr writer v2 #48

Feature/ome zarr writer v2 #48

Conversation

toloudis commented Jul 3, 2024 • edited Loading

Link to Relevant Issue

Description of Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toloudis Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toloudis Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrianWhitneyAI left a comment

Choose a reason for hiding this comment

evamaxfield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toloudis commented Jul 15, 2024

toloudis commented Jul 3, 2024 •

edited

Loading

toloudis Jul 15, 2024 •

edited

Loading

toloudis Jul 15, 2024 •

edited

Loading