attempts to create a writer #69

glyg · 2020-12-04T17:03:10Z

This code base does not offer a clear way to create zarr images from scratch (appart for the function in the data module). Although bioimage2raw works fine for already existing data, at some point data will be recorded directly in zarr, especially in custom microscope setups.

So, following a discussion with Josh on image.sc, here is an attempt at a write_image function.

This is very early, in particular I think the ZarrLocation classes should be adapted to writing and better manage paths and groups.

Thanks for any input!

glyg · 2020-12-04T17:04:46Z

looks like I broke some references, let me do this again ^^'

joshmoore

Thanks, @glyg. Generally looks good. Let me know if you want to try the integration into ome_zarr.io or not. I like the metadata argument 👍 The byte_order argument may go through a refactoring soon as we loosen the dimension order early next year, but this makes good sense for the moment.

I should have the github actions happy again after lots of work in #51. I'll try to push something today to make it go green.

Cheers.

joshmoore · 2020-12-06T11:48:07Z

ome_zarr/writer.py

+    """
+    zarr_location = parse_url(path)
+
+    ## I assume this should be dealt with in


Agreed. I think what's needed first is a way to pass mode through to the parse_url and *Location classes.

joshmoore · 2020-12-06T11:50:34Z

ome_zarr/writer.py

+
+    root = zarr.group(store)
+    if group is not None:
+        grp = root.create_group(group)


require_group should do what you want.

joshmoore · 2020-12-06T17:48:44Z

Pushed commits to:

merge master to disable test_napari.py for zarr-dev
fix the output of pre-commit run -a

joshmoore · 2020-12-07T12:00:25Z

Thanks for the update, @glyg. Do you want to run pre-commit run -a and push, or shall I? (I can recommend running pre-commit install in any repo that requires it)

glyg · 2020-12-07T13:32:27Z

Hi @joshmoore I did not know pre-commit I'll apply it. I'm trying to integrate the ZarrLocation mechanisms to the writer, will push (and run pre-commit before :) once I have something functionnal

joshmoore · 2020-12-07T14:42:55Z

Sounds great! Thanks again, @glyg.

glyg · 2020-12-07T16:57:34Z

Ok, I have some issues

First pre-commit fails but I don't know how to make mypy happy

Second, when it seems that when only OMERO specs are found by the reader, node.data is not set (although the array is there).
I'm not sure whether this is intended or how to manage it.

Finally I assume the writer should be coded as an object, but I'm not sure how.

joshmoore · 2020-12-07T17:00:43Z

First pre-commit fails but I don't know how to make mypy happy

ome_zarr/writer.py:52: error: Argument "zarr" to "Node" has incompatible type "Optional[BaseZarrLocation]"; expected "BaseZarrLocation"

This means that the variable somewhere could conceivably be None. You can either perform the if zarr == None test, or make sure it's never None.

ome_zarr/writer.py:65: error: Incompatible types in assignment (expression has type "Tuple[Any, ...]", variable has type "Union[Tuple[int], int, None]")

You might want to create a new variable like _chunks: Tuple[int] = ... then the "compiler" should know what you're trying to do.

glyg · 2020-12-11T16:32:45Z

I had to tell the OMERO spec to put the data in the node at read time, I'm not sure it does not break anything (although the tests pass)
Is a class needed for the writer?
random channel colors might be ugly

joshmoore · 2020-12-16T08:54:59Z

random channel colors might be ugly

Seems like that can be handled in the future.

Is a class needed for the writer?

I was wondering about that. I don't think it's a MUST. Did you find any state that you would have felt useful to keep in such an object?

I had to tell the OMERO spec to put the data in the node at read time, I'm not sure it does not break anything

I'll take a look.

joshmoore · 2020-12-16T09:23:05Z

I had to tell the OMERO spec to put the data in the node at read time,

Do you think this is a problem with the order of operations? i.e. if the data nodes ran before the metadata nodes?

glyg · 2020-12-16T09:29:20Z

HI Josh,

Do you think this is a problem with the order of operations?

Yes, it is not clear to me when data is appended to node.data. If you have pyramid images, then the Multiscale spec will access the data, but the OMERO spec will also be present. If it's always accessed last, then testing if node.data is empty as I did should be enough?

Did you find any state that you would have felt useful to keep in such an object?

The scenario I was thinking might justify a Writer class instance is if you need to append children / nodes at data generation time. e.g. during the course of an acquisition. Then maybe some kind of object persistence would be interesting.

joshmoore

Hi @glyg. I just tried this out while trying to come up with an example for @oeway, do you mind if I push a few commits for your review? Alternatively, do you have any changes that are coming yourself?

joshmoore · 2021-01-08T17:03:06Z

ome_zarr/writer.py

+
+
+def write_image(
+    path: str,


I could imagine passing a zarr.Group here rather than path+name+group

yes makes sense (I realize the group argument here is not used).
Do you think both way of specifying the output should co-exist or it's better to leave the job of creating the group to the user all the time?

Let's start with leaving it to the user since there are so many options (local v. remote, etc.) but if we realize we're doing the same thing over and over again, we'll provide a helper.

sounds reasonable :)

joshmoore · 2021-01-08T17:03:34Z

ome_zarr/writer.py

+    image: np.ndarray,
+    name: str = "0",
+    group: str = None,
+    chunks: Union[Tuple[int], int] = None,


If this method is to handle downsampling internally, I wonder if we want to pass multiple chunk sizes or keep them consistent.

seems to me it's easier to keep the same chunks for all sample sizes. Or scale the number of chunks - maybe in the x, y dimensions - to have a consistent chunk size across scales? I'd assume the specified chunk size is for the 1:1 scale.

It sounds complicated to ask the user to input one chunk size for each scale - imho it should be as transparent as possible.

Maybe it will get clearer when write_multiscales is integrated here?

Yeah, I had gotten that far ;) I think you're right that it's more difficult than one would want in general. Is it worth though providing all the options, i.e.

Union[Tuple[Tuple[int]], Tuple[int], int]

joshmoore · 2021-01-08T17:04:59Z

ome_zarr/writer.py

+    image = image.reshape(shape_5d)
+
+    if chunks is None:
+        image = da.from_array(image)


If no reshaping is necessary, it might be preferred to leave the input array as is and use a convenience zarr function to improve speed, but I'd need to do some testing.

I'll let you the judge of that :)

joshmoore · 2021-01-08T17:05:30Z

ome_zarr/writer.py

+            ]
+            omero["rdefs"] = {"model": "color"}
+
+    metadata["omero"] = omero


This is currently missing the multiscales metadata. I'd propose moving write_multiscales from ome_zarr.data to your writer module.

glyg · 2021-01-08T18:34:22Z

Hi @glyg. I just tried this out while trying to come up with an example for @oeway, do you mind if I push a few commits for your review? Alternatively, do you have any changes that are coming yourself?

Hi @joshmoore of course feel free to push on this PR. I was actually waiting for your feedback before pursuing this, so there is nothing coming from me.
G.

glyg · 2021-01-11T14:35:50Z

Hi @joshmoore - finally, should I apply the changes you suggest? I can do that today or early tomorrow.

Best,

Guillaume

joshmoore · 2021-01-11T14:41:02Z

Hi Guillame, sorry meetings have run long today. 🙂 Let me open a PR with my work-in-progress and you can give me feedback, trash it, etcl. ~J.

Goal is to re-use write_multiscale previously from ome_zarr.data and to remove hard-coding of the storage logic by passing a pre- existing zarr.Group.

glyg · 2021-01-12T09:24:47Z

ome_zarr/io.py

@@ -189,7 +193,7 @@ def get_json(self, subpath: str) -> JSONDict:
            return {}


-def parse_url(path: str) -> Optional[BaseZarrLocation]:
+def parse_url(path: str, mode: str = "r") -> Optional[BaseZarrLocation]:


As this function is not used in the writer module, maybe I should revert the changes in io.py

I was working on a simple example for Wei (his use case was "how to turn an existing zarr array into an ome-zarr"). Let's keep this until we have a simple working user example. Maybe parse_url can still be the starting point.

previous example

#!/usr/bin/env python import numpy as np import zarr as zr from ome_zarr.io import parse_url from ome_zarr.reader import OMERO, Reader from ome_zarr.writer import write_image import argparse parser = argparse.ArgumentParser() parser.add_argument("path") ns = parser.parse_args() dtype = np.uint8 mean = 10 shape = (1, 3, 1, 256, 256) rng = np.random.default_rng(0) data = rng.poisson(mean, size=shape).astype(dtype) store = zr.DirectoryStore(ns.path) grp = zr.group(store) write_image(data, grp, chunks=(128, 128))

* fix warning on mypy `exclude` * fix .isort.cfg modification * temporarily disable blind-except due to false positives

codecov · 2021-02-04T15:28:57Z

Codecov Report

Merging #69 (f22c564) into master (589cb52) will increase coverage by 0.48%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master      #69      +/-   ##
==========================================
+ Coverage   66.37%   66.85%   +0.48%     
==========================================
  Files          10       11       +1     
  Lines        1023     1056      +33     
==========================================
+ Hits          679      706      +27     
- Misses        344      350       +6

Impacted Files	Coverage Δ
ome_zarr/reader.py	`58.86% <ø> (ø)`
ome_zarr/io.py	`77.77% <60.00%> (-2.06%)`	⬇️
ome_zarr/writer.py	`93.75% <93.75%> (ø)`
ome_zarr/data.py	`98.64% <100.00%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 729ab5b...f22c564. Read the comment docs.

joshmoore · 2021-03-22T08:40:54Z

@glyg very sorry to have let this slipped. Getting back to my example for @oeway:

#!/usr/bin/env python
import numpy as np
import zarr as zr
from ome_zarr.writer import write_image

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("path")
ns = parser.parse_args()

# Set up your numpy, whatever it is
dtype = np.int8
mean = 1000
shape = (1, 3, 10, 4096, 4096)
rng = np.random.default_rng(0)
data = rng.poisson(mean, size=shape).astype(dtype)

# Write it out to a Zarr group
store = zr.DirectoryStore(ns.path)
grp = zr.group(store)
write_image(data, grp, chunks=(128, 128))

looks like the only thing missing was the downsampling. I went ahead and added that in a commit. We may need to change the API slightly to let someone choose not to downsample, e.g. by passing scaler=None or by passing a pre-computed pyramid, but as far as I can tell, things are all working.

If my last commit goes green, let's get this in and start hammering on it!

Thanks for your patience. (Though feel free to kick me in the future.)

will-moore

Looks good. Running test locally and inspecting the output looks as expected.
Happy for this to be merged and I'll try using it 'for real'

will-moore · 2021-03-22T11:09:20Z

ome_zarr/writer.py

+    paths = []
+    for path, dataset in enumerate(pyramid):
+        # TODO: chunks here could be different per layer
+        group.create_dataset(str(path), data=pyramid[path], chunks=chunks)


dataset variable is not used, unless you want to do data=dataset ?

ome_zarr/writer.py

will-moore · 2021-03-22T12:06:27Z

ome_zarr/writer.py

+        chunks = _retuple(chunks, shape_5d)
+    omero = metadata.get("omero", {})
+    # Update the size entry anyway
+    omero["size"] = {


I don't think we need to have size metadata here since we have that info in the .zarray and it's not part of the OME-Zarr spec.

will-moore · 2021-03-22T12:11:17Z

ome_zarr/writer.py

+            omero["channels"] = [
+                {
+                    "color": "".join(f"{i:02x}" for i in color),
+                    "window": {"start": 0, "end": 1},


It's probably better to omit all the omero['channels'] metadata if we're not provided with it.
Generating these numbers will cause viewers to assume they are based on the data, which could lead to poor visualisation (e.g. render levels 0 - 1 will likely be saturated)

@glyg : any thoughts from your initial use case?

I think I just tried to emulate the code I saw in the tests or elsewhere, I agree that by default metadata entries should as small as possible

will-moore · 2021-03-22T14:13:39Z

In testing this, I adapted omero-cli-zarr to use it for writing - See ome/omero-cli-zarr#58.
Not sure we want to use this approach (see PR) but it seems to be working.

glyg · 2021-03-22T14:52:54Z

ome_zarr/reader.py

        except Exception as e:
+            raise e


Should this be there? The LOGGER line below will never be called

Pretty likely debugging code that crept in. Thanks, @glyg

glyg · 2021-03-22T14:56:27Z

ome_zarr/writer.py

+    else:
+        _chunks = chunks
+
+    return (*shape[: (5 - len(_chunks))], *_chunks)


This is all based on the assumption that we have 5D data, which is being relaxed in ome/ngff#39, should there be a default retuple to 5 or no attempt at all to reshape the data?

Good point, but let's handle that along with #39. Ultimately I think the rules we'll have to be laid out, the most flexible being "once the shape is defined (e.g. len(shape)==4), then everything else (e.g. chunks) should stay in that shape & order.

joshmoore · 2021-03-23T08:53:24Z

@will-moore @glyg : pushed a few fixes:

Remove the metadata as agreed
Fixed the raise e issue
Allowed scaler to be set to None

A few things that I think need to happen, but can be outside this PR:

Pyramids should be pre-computable, but that conflicts with the dimension detection, so we can do that with the other dimension fixes.
I think the Scaler class needs a __call__ method which defaults to using nearest() internally. So then a scaler is basically just a function which takes an array and generates a pyramid.

will-moore · 2021-03-23T21:14:19Z

@joshmoore You said "pushed a few fixes", but the last commit I see is b832df5 (1 commit yesterday)?
Am I missing some?

joshmoore · 2021-03-24T08:43:46Z

Doh! Sorry, @will-moore. I've made that mistake several times now. Pushed to my writer branch, rather than @glyg's. Commits are there now.

will-moore · 2021-03-24T10:05:36Z

Looks good.
Tests are passing etc.
Tried with/without omero metadata and with scaler=None from my omero-cli-zarr PR above. All works as expected.

joshmoore · 2021-03-24T16:36:28Z

Whew. Merging 🎉 Thanks for your effort (& patience!) here, @glyg.

glyg · 2021-03-25T07:48:07Z

thanks a lot @joshmoore and @will-moore , very happy to be of some help :)

glyg added 2 commits December 3, 2020 19:04

attempts to create a writer

447ffd6

attempts to create a writer

ad261bc

glyg added 2 commits December 4, 2020 18:06

adds the files

2d83edd

Merge branch 'writer' of github.com:glyg/ome-zarr-py into writer

07e6cb8

glyg force-pushed the writer branch from 447ffd6 to ad261bc Compare December 4, 2020 17:08

joshmoore reviewed Dec 6, 2020

View reviewed changes

joshmoore added 2 commits December 6, 2020 18:37

Merge 'origin/master' into HEAD

3424efa

Fix issues from 'pre-commit run -a'

552ea9d

passes the writer test

212db6a

WIP add write mechanisms to LocalZarrLocaltion

c970c82

fixes obvious lint pbs

ffa2bda

glyg added 3 commits December 7, 2020 18:03

Sorry for the nosie I'll get used to it

371404e

Relaxing typing constrains, mypy passes

2a12507

adds channels metadata, read from omero spec if nothing else

7b9599a

joshmoore reviewed Jan 8, 2021

View reviewed changes

Slight refactoring of ome_zarr.writer

d55edae

Goal is to re-use write_multiscale previously from ome_zarr.data and to remove hard-coding of the storage logic by passing a pre- existing zarr.Group.

glyg commented Jan 12, 2021

View reviewed changes

pre-commit fixes

f6123c2

* fix warning on mypy `exclude` * fix .isort.cfg modification * temporarily disable blind-except due to false positives

joshmoore mentioned this pull request Feb 4, 2021

Provide make_napari_viewer as a pytest plugin, for external use. napari/napari#2131

Merged

10 tasks

joshmoore and others added 3 commits February 4, 2021 13:06

Try bumping to napari 0.4.4

9731606

Register show-viewer in conftest as workaround

d5f1614

Merge branch 'master' into writer

6c91ed0

Add downsampling

64bafb1

will-moore approved these changes Mar 22, 2021

View reviewed changes

joshmoore reviewed Mar 22, 2021

View reviewed changes

ome_zarr/writer.py Outdated Show resolved Hide resolved

Use dataset variable

b832df5

will-moore reviewed Mar 22, 2021

View reviewed changes

will-moore mentioned this pull request Mar 22, 2021

Use ome_zarr.writer from ome_zarr PR 69 ome/omero-cli-zarr#58

Closed

glyg commented Mar 22, 2021

View reviewed changes

joshmoore added 4 commits March 24, 2021 09:43

Revert debugging raise

e51b1be

Remove unnecessary omero metadata

d35cc5e

write_image: accept array or list

0d0ebfd

write_image: revert array or list

f22c564

joshmoore merged commit c3f641d into ome:master Mar 24, 2021

sbesson mentioned this pull request Mar 16, 2022

write_multiscales does not expose additional metadata #176

Closed

attempts to create a writer #69

attempts to create a writer #69

Conversation

glyg commented Dec 4, 2020

glyg commented Dec 4, 2020

joshmoore left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshmoore commented Dec 6, 2020

joshmoore commented Dec 7, 2020

glyg commented Dec 7, 2020

joshmoore commented Dec 7, 2020

glyg commented Dec 7, 2020

joshmoore commented Dec 7, 2020

glyg commented Dec 11, 2020

joshmoore commented Dec 16, 2020

joshmoore commented Dec 16, 2020

glyg commented Dec 16, 2020 • edited Loading

joshmoore left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glyg Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glyg commented Jan 8, 2021

glyg commented Jan 11, 2021

joshmoore commented Jan 11, 2021

glyg Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 4, 2021 • edited Loading

Codecov Report

joshmoore commented Mar 22, 2021

will-moore left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

will-moore commented Mar 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshmoore commented Mar 23, 2021

will-moore commented Mar 23, 2021

joshmoore commented Mar 24, 2021

will-moore commented Mar 24, 2021

joshmoore commented Mar 24, 2021

glyg commented Mar 25, 2021

glyg commented Dec 16, 2020 •

edited

Loading

glyg Jan 11, 2021 •

edited

Loading

glyg Jan 12, 2021 •

edited

Loading

codecov bot commented Feb 4, 2021 •

edited

Loading