-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attempts to create a writer #69
Conversation
looks like I broke some references, let me do this again ^^' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @glyg. Generally looks good. Let me know if you want to try the integration into ome_zarr.io
or not. I like the metadata
argument 👍 The byte_order
argument may go through a refactoring soon as we loosen the dimension order early next year, but this makes good sense for the moment.
I should have the github actions happy again after lots of work in #51. I'll try to push something today to make it go green.
Cheers.
ome_zarr/writer.py
Outdated
""" | ||
zarr_location = parse_url(path) | ||
|
||
## I assume this should be dealt with in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think what's needed first is a way to pass mode
through to the parse_url
and *Location
classes.
ome_zarr/writer.py
Outdated
|
||
root = zarr.group(store) | ||
if group is not None: | ||
grp = root.create_group(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require_group
should do what you want.
Pushed commits to:
|
Thanks for the update, @glyg. Do you want to run |
Hi @joshmoore I did not know |
Sounds great! Thanks again, @glyg. |
Ok, I have some issues First pre-commit fails but I don't know how to make mypy happy Second, when it seems that when only OMERO specs are found by the reader, Finally I assume the writer should be coded as an object, but I'm not sure how. |
This means that the variable somewhere could conceivably be
You might want to create a new variable like |
|
Seems like that can be handled in the future.
I was wondering about that. I don't think it's a MUST. Did you find any state that you would have felt useful to keep in such an object?
I'll take a look. |
Do you think this is a problem with the order of operations? i.e. if the data nodes ran before the metadata nodes? |
HI Josh,
Yes, it is not clear to me when data is appended to
The scenario I was thinking might justify a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ome_zarr/writer.py
Outdated
|
||
|
||
def write_image( | ||
path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine passing a zarr.Group
here rather than path+name+group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes makes sense (I realize the group argument here is not used).
Do you think both way of specifying the output should co-exist or it's better to leave the job of creating the group to the user all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with leaving it to the user since there are so many options (local v. remote, etc.) but if we realize we're doing the same thing over and over again, we'll provide a helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable :)
ome_zarr/writer.py
Outdated
image: np.ndarray, | ||
name: str = "0", | ||
group: str = None, | ||
chunks: Union[Tuple[int], int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is to handle downsampling internally, I wonder if we want to pass multiple chunk sizes or keep them consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me it's easier to keep the same chunks for all sample sizes. Or scale the number of chunks - maybe in the x, y dimensions - to have a consistent chunk size across scales? I'd assume the specified chunk size is for the 1:1 scale.
It sounds complicated to ask the user to input one chunk size for each scale - imho it should be as transparent as possible.
Maybe it will get clearer when write_multiscales
is integrated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had gotten that far ;) I think you're right that it's more difficult than one would want in general. Is it worth though providing all the options, i.e.
Union[Tuple[Tuple[int]], Tuple[int], int]
ome_zarr/writer.py
Outdated
image = image.reshape(shape_5d) | ||
|
||
if chunks is None: | ||
image = da.from_array(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no reshaping is necessary, it might be preferred to leave the input array as is and use a convenience zarr function to improve speed, but I'd need to do some testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let you the judge of that :)
ome_zarr/writer.py
Outdated
] | ||
omero["rdefs"] = {"model": "color"} | ||
|
||
metadata["omero"] = omero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently missing the multiscales
metadata. I'd propose moving write_multiscales
from ome_zarr.data
to your writer module.
Hi @joshmoore of course feel free to push on this PR. I was actually waiting for your feedback before pursuing this, so there is nothing coming from me. |
Hi @joshmoore - finally, should I apply the changes you suggest? I can do that today or early tomorrow. Best, Guillaume |
Hi Guillame, sorry meetings have run long today. 🙂 Let me open a PR with my work-in-progress and you can give me feedback, trash it, etcl. ~J. |
Goal is to re-use write_multiscale previously from ome_zarr.data and to remove hard-coding of the storage logic by passing a pre- existing zarr.Group.
@@ -189,7 +193,7 @@ def get_json(self, subpath: str) -> JSONDict: | |||
return {} | |||
|
|||
|
|||
def parse_url(path: str) -> Optional[BaseZarrLocation]: | |||
def parse_url(path: str, mode: str = "r") -> Optional[BaseZarrLocation]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this function is not used in the writer module, maybe I should revert the changes in io.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was working on a simple example for Wei (his use case was "how to turn an existing zarr array into an ome-zarr"). Let's keep this until we have a simple working user example. Maybe parse_url
can still be the starting point.
previous example
#!/usr/bin/env python
import numpy as np
import zarr as zr
from ome_zarr.io import parse_url
from ome_zarr.reader import OMERO, Reader
from ome_zarr.writer import write_image
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("path")
ns = parser.parse_args()
dtype = np.uint8
mean = 10
shape = (1, 3, 1, 256, 256)
rng = np.random.default_rng(0)
data = rng.poisson(mean, size=shape).astype(dtype)
store = zr.DirectoryStore(ns.path)
grp = zr.group(store)
write_image(data, grp, chunks=(128, 128))
* fix warning on mypy `exclude` * fix .isort.cfg modification * temporarily disable blind-except due to false positives
Codecov Report
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
+ Coverage 66.37% 66.85% +0.48%
==========================================
Files 10 11 +1
Lines 1023 1056 +33
==========================================
+ Hits 679 706 +27
- Misses 344 350 +6
Continue to review full report at Codecov.
|
@glyg very sorry to have let this slipped. Getting back to my example for @oeway:
looks like the only thing missing was the downsampling. I went ahead and added that in a commit. We may need to change the API slightly to let someone choose not to downsample, e.g. by passing If my last commit goes green, let's get this in and start hammering on it! Thanks for your patience. (Though feel free to kick me in the future.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Running test locally and inspecting the output looks as expected.
Happy for this to be merged and I'll try using it 'for real'
ome_zarr/writer.py
Outdated
paths = [] | ||
for path, dataset in enumerate(pyramid): | ||
# TODO: chunks here could be different per layer | ||
group.create_dataset(str(path), data=pyramid[path], chunks=chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset
variable is not used, unless you want to do data=dataset
?
ome_zarr/writer.py
Outdated
chunks = _retuple(chunks, shape_5d) | ||
omero = metadata.get("omero", {}) | ||
# Update the size entry anyway | ||
omero["size"] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to have size
metadata here since we have that info in the .zarray
and it's not part of the OME-Zarr spec.
ome_zarr/writer.py
Outdated
omero["channels"] = [ | ||
{ | ||
"color": "".join(f"{i:02x}" for i in color), | ||
"window": {"start": 0, "end": 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to omit all the omero['channels']
metadata if we're not provided with it.
Generating these numbers will cause viewers to assume they are based on the data, which could lead to poor visualisation (e.g. render levels 0 - 1 will likely be saturated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glyg : any thoughts from your initial use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just tried to emulate the code I saw in the tests or elsewhere, I agree that by default metadata entries should as small as possible
In testing this, I adapted |
ome_zarr/reader.py
Outdated
except Exception as e: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be there? The LOGGER line below will never be called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty likely debugging code that crept in. Thanks, @glyg
else: | ||
_chunks = chunks | ||
|
||
return (*shape[: (5 - len(_chunks))], *_chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all based on the assumption that we have 5D data, which is being relaxed in ome/ngff#39, should there be a default retuple to 5 or no attempt at all to reshape the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but let's handle that along with #39. Ultimately I think the rules we'll have to be laid out, the most flexible being "once the shape is defined (e.g. len(shape)==4
), then everything else (e.g. chunks) should stay in that shape & order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great!
@will-moore @glyg : pushed a few fixes:
A few things that I think need to happen, but can be outside this PR:
|
@joshmoore You said "pushed a few fixes", but the last commit I see is b832df5 (1 commit yesterday)? |
Doh! Sorry, @will-moore. I've made that mistake several times now. Pushed to my |
Looks good. |
Whew. Merging 🎉 Thanks for your effort (& patience!) here, @glyg. |
thanks a lot @joshmoore and @will-moore , very happy to be of some help :) |
This code base does not offer a clear way to create zarr images from scratch (appart for the function in the
data
module). Although bioimage2raw works fine for already existing data, at some point data will be recorded directly in zarr, especially in custom microscope setups.So, following a discussion with Josh on image.sc, here is an attempt at a
write_image
function.This is very early, in particular I think the
ZarrLocation
classes should be adapted to writing and better manage paths and groups.Thanks for any input!