-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add axes field to multiscale metadata #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few thoughts (sorry if you're still in the process of updating)
I synced the latest changes to 0.2 now and updated all version numbers (I could find) to 0.3 now. |
When migrating #39 I forgot to update the dimensionality requirements, i.e. allowing for image data with less than 5 dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @constantinpape explicit dimensions are helpful to make the dimension identity self-evident and make the format easy to understand and use.
In the interest of compatibility with xarray, we could update the way the axes are encoded with:
"0/example/.zattrs": {
"_ARRAY_DIMENSIONS": [
"t",
"c",
"z",
"y",
"x",
],
per https://xarray.pydata.org/en/stable/internals.html#zarr-encoding-specification
latest/index.bs
Outdated
│ └── t.c.z.y.x # a "chunk coordinate", where the maximum coordinate | ||
│ # will be `dimension_size / chunk_size`. | ||
│ └─ t # Chunks are stored with the nested directory layout. | ||
│ └─ c # All but the last chunk element are stored as directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stick with a flat directory? The motivation is compatibility with xarray and avoiding unnecessary overhead with IPFS 👾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thewtex. Yes & no.
Short-term
The switch in v0.2 was to let us start actually creating these datasets before all implementations start supporting zarr-developers/zarr-python#715, i.e. ome-zarr-*
implementations for v0.2 can "blindly" assert the use of "/" as the dimension separator. Python implementations should be in the (enviable) position that it doesn't matter. If you put the metadata in place, it should work for either assuming your zarr-python version is >=2.8.0 (note: ongoing bug fixes) Does that track with what you are seeing or am I missing something on the xarray front?
Longer-term
Once all of this has propagated through the ecosystem I think this spec shouldn't really care about which array layout is used. So a v0.3 or later can drop the requirement all together. That, however, does not get you want you want in the general case with IPFS. Do you foresee it really being that bad? (I ask because I know that the flat directory is unusable at some scales.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshmoore ah, thanks for the information! I was not aware of zarr-developers/zarr-python#715 .
On the xarray front, they are not tied to a specific zarr spec, as far as I know, so it should cause an issue. But, it seems that we should do the same here, i.e. both layouts, .
or /
, would be acceptable, as you suggest, correct?
I ask because I know that the flat directory is unusable at some scales.
I can see how this would be the case. In which store types do you experience this?
For IPFS, each entity is identified with a cryptographic-hash. Updating a chunk in a file means updating its identifier, but it also means updating the identifiers for all parent directories in the directory tree; so, deeply nested trees are not desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both layouts,
.
or/
, would be acceptable, as you suggest, correct?
That's the plan.
In which store types do you experience this?
DirectoryStore & FSStore
so, deeply nested trees are not desirable.
Understood. Thanks for the info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think is the best way to move forward here, @thewtex @joshmoore?
I would rather not change this now (especially since this change was done in a previous PR already and the changes here are just about syncing the different versions of the spec). Also, like @joshmoore said, not supporting hiearchical storage for the chunks can be a big issue for large datasets on the file-system.
So I would propose that we leave this as it is for now and eventually change the spec once most implementations are agnostic to the actual separator.
If really necessary and there are some implementations that require a flat hierarchy, this could then be added as an extra implementation specific requirement. (I.e. these implementations would not be able to read all ngff formats, but only a subset.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd propose we leave 0.2 as hierarchical, focus 0.3 on axes, and then we can try to remove the 0.2 restriction ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Sorry, it took me a while to catch up with this. Holidays and then a fairly busy week otherwise.
Edit: Sorry, I did not think the above comment fully through. After looking at your example again, I realized that we can easily support this if we just require each "scale array" to contain the |
@joshmoore I think this is good to be merged, but I would suggest we make a 0.2.0 release in the main repository before that. |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/ngff-status-update-may-2021/52918/1 |
│ | ||
├── 0 # Each multiscale level is stored as a separate Zarr array, | ||
│ ... # which is a folder containing chunk files which compose the array. | ||
├── n # The name of the array is arbitrary with the ordering defined by | ||
│ │ # by the "multiscales" metadata, but is often a sequence starting at 0. | ||
│ │ | ||
│ ├── .zarray # All image arrays are 5-dimensional | ||
│ ├── .zarray # All image arrays must be up to 5-dimensional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshmoore shouldn't this be:
must be 5+ dimensional
? OME-Tiff has the 8D / n-D spec and it would be a shame to lose that 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to allow for less(!) than 5D, so that it's not necessary to introduce singleton dimensions for 2D/3D/4D data.
There is an ongoing discussion about allowing for more than 5D, see #35. But that is out of scope for the current PR.
"axes": [ | ||
"t", "c", "z", "y", "x" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do support n-dimensional, (OME 8D spec / OME n-D spec) then this simply expands to include the dimension shorthands.
@constantinpape, a few points before I disappear on holidays:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few last minute items to go on top of #50
latest/index.bs
Outdated
@@ -2,8 +2,7 @@ | |||
Title: Next-generation file formats (NGFF) | |||
Shortname: ome-ngff | |||
Level: 1 | |||
Status: LS-COMMIT | |||
Status: w3c/ED | |||
Status: w3c/CG-FINAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you likely want to revert these top-matter items
latest/index.bs
Outdated
@@ -18,12 +17,12 @@ Editor: Sébastien Besson, Open Microscopy Environment (OME) https://www.openmic | |||
Abstract: This document contains next-generation file format (NGFF) | |||
Abstract: specifications for storing bioimaging data in the cloud. | |||
Abstract: All specifications are submitted to the https://image.sc community for review. | |||
Status Text: The current released version of this specification is | |||
Status Text: <a href="../0.1/index.html">0.1</a>. Migration scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text should stay as is but point to the 0.3
that you will create when latest/index.bs
is done.
@@ -551,6 +597,11 @@ Version History {#history} | |||
<td>Description</td> | |||
</tr> | |||
</thead> | |||
<tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus add in your own changelog.
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: |
I've dealt with merge conflicts and handled the "publication steps" (such as they are) if anyone would like to take a look. |
If no objections, I'll merge this and move forward with https://github.com/orgs/ome/projects/17?add_cards_query=is%3Aopen |
Sorry, just a thought.... Do we want limit the last two axes to be
But do we want all clients that support the spec to also support all of those? |
Ah - sorry, I missed the existing caveat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new version of the specification has already been tested in various read and write scenarios across different implementations including omero-cli-zarr, vizarr, MoBIE. The specification changes look good so I am in favor of getting this published and completing the various tasks associated with OME-NGFF 0.3.
Add axes field to multiscale metadata SHA: 2986bbe Reason: push, by @joshmoore Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Just spinning back up after my hiking-trip-turned-lockdown-staycation and I'm so delighted to find this merged! 🎉 Thank you SO much to everyone for your patience and perseverance with this! ❤️🔥 ( |
Add the
axes
field tomultiscales
and updatelatest
to v0.2. Pending #44 and release of 0.2 (that's why I marked it as a draft).The discussion whether
axes
is mandatory (MUST) or encouraged (SHOULD) is still pending. cc @joshmoore @sbesson @jni