-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add axes field to multiscale metadata #46
Changes from 7 commits
bba6940
6b8a868
a345247
9c57396
e60ec27
cf76570
af7b947
c46d9d5
47fa3dc
3e9908c
a981f98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,7 @@ | |
Title: Next-generation file formats (NGFF) | ||
Shortname: ome-ngff | ||
Level: 1 | ||
Status: LS-COMMIT | ||
Status: w3c/ED | ||
Status: w3c/CG-FINAL | ||
Group: ome | ||
URL: https://ngff.openmicroscopy.org/latest/ | ||
Repository: https://github.com/ome/ngff | ||
|
@@ -18,12 +17,12 @@ Editor: Sébastien Besson, Open Microscopy Environment (OME) https://www.openmic | |
Abstract: This document contains next-generation file format (NGFF) | ||
Abstract: specifications for storing bioimaging data in the cloud. | ||
Abstract: All specifications are submitted to the https://image.sc community for review. | ||
Status Text: The current released version of this specification is | ||
Status Text: <a href="../0.1/index.html">0.1</a>. Migration scripts | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This text should stay as is but point to the |
||
Status Text: will be provided between numbered versions. Data written with these latest changes | ||
Status Text: This is the 0.3 release of this specification. Migration scripts | ||
Status Text: will be provided between numbered versions. Data written with the latest version | ||
Status Text: (an "editor's draft") will not necessarily be supported. | ||
</pre> | ||
|
||
|
||
Introduction {#intro} | ||
===================== | ||
|
||
|
@@ -117,20 +116,22 @@ multiple levels of resolutions and optionally associated labels. | |
│ | ||
├── .zgroup # Each image is a Zarr group, or a folder, of other groups and arrays. | ||
├── .zattrs # Group level attributes are stored in the .zattrs file and include | ||
│ # "multiscales" and "omero" below) | ||
│ # "multiscales" and "omero" (see below). In addition, the group level attributes | ||
│ # must also contain "_ARRAY_DIMENSIONS" if this group directly contains multi-scale arrays. | ||
│ | ||
├── 0 # Each multiscale level is stored as a separate Zarr array, | ||
│ ... # which is a folder containing chunk files which compose the array. | ||
├── n # The name of the array is arbitrary with the ordering defined by | ||
│ │ # by the "multiscales" metadata, but is often a sequence starting at 0. | ||
│ │ | ||
│ ├── .zarray # All image arrays are 5-dimensional | ||
│ ├── .zarray # All image arrays must be up to 5-dimensional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @joshmoore shouldn't this be:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal of this PR is to allow for less(!) than 5D, so that it's not necessary to introduce singleton dimensions for 2D/3D/4D data. |
||
│ │ # with dimension order (t, c, z, y, x). | ||
│ │ | ||
│ ├── 0.0.0.0.0 # Chunks are stored with the flat directory layout. | ||
│ │ ... # Each dotted component of the chunk file represents | ||
│ └── t.c.z.y.x # a "chunk coordinate", where the maximum coordinate | ||
│ # will be `dimension_size / chunk_size`. | ||
│ └─ t # Chunks are stored with the nested directory layout. | ||
constantinpape marked this conversation as resolved.
Show resolved
Hide resolved
|
||
│ └─ c # All but the last chunk element are stored as directories. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we stick with a flat directory? The motivation is compatibility with xarray and avoiding unnecessary overhead with IPFS 👾 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @thewtex. Yes & no. Short-termThe switch in v0.2 was to let us start actually creating these datasets before all implementations start supporting zarr-developers/zarr-python#715, i.e. Longer-termOnce all of this has propagated through the ecosystem I think this spec shouldn't really care about which array layout is used. So a v0.3 or later can drop the requirement all together. That, however, does not get you want you want in the general case with IPFS. Do you foresee it really being that bad? (I ask because I know that the flat directory is unusable at some scales.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @joshmoore ah, thanks for the information! I was not aware of zarr-developers/zarr-python#715 . On the xarray front, they are not tied to a specific zarr spec, as far as I know, so it should cause an issue. But, it seems that we should do the same here, i.e. both layouts,
I can see how this would be the case. In which store types do you experience this? For IPFS, each entity is identified with a cryptographic-hash. Updating a chunk in a file means updating its identifier, but it also means updating the identifiers for all parent directories in the directory tree; so, deeply nested trees are not desirable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's the plan.
DirectoryStore & FSStore
Understood. Thanks for the info. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think is the best way to move forward here, @thewtex @joshmoore? So I would propose that we leave this as it is for now and eventually change the spec once most implementations are agnostic to the actual separator. If really necessary and there are some implementations that require a flat hierarchy, this could then be added as an extra implementation specific requirement. (I.e. these implementations would not be able to read all ngff formats, but only a subset.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I'd propose we leave 0.2 as hierarchical, focus 0.3 on axes, and then we can try to remove the 0.2 restriction ASAP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
│ └─ z # The terminal chunk is a file. Together the directory and file names | ||
│ └─ y # provide the "chunk coordinate" (t, c, z, y, x), where the maximum coordinate | ||
│ └─ x # will be `dimension_size / chunk_size`. | ||
│ | ||
└── labels | ||
│ | ||
|
@@ -207,8 +208,56 @@ keys as specified below for discovering certain types of data, especially images | |
|
||
Metadata about the multiple resolution representations of the image can be | ||
found under the "multiscales" key in the group-level metadata. | ||
The specification for the multiscale (i.e. "resolution") metadata is provided | ||
in [zarr-specs#50](https://github.com/zarr-developers/zarr-specs/issues/50). | ||
|
||
"multiscales" contains a list of dictionaries where each entry describes a multiscale image. | ||
|
||
Each dictionary contained in the list MUST contain the field "datasets", which is a list of dictionaries describing | ||
the arrays storing the individual resolution levels. | ||
Each dictionary in "datasets" MUST contain the field "path", whose value contains the path to the array for this resolution relative | ||
to the current zarr group. The "path"s MUST be ordered from largest (i.e. highest resolution) to smallest. | ||
|
||
It MUST contain the field "axes", which is a list of dimension names of the axes. | ||
The values MUST be unique and one of `{"t", "c", "z", "y", "x"}`. | ||
The number of values MUST be the same as the number of dimensions of the arrays corresponding to this image. | ||
In addition, the "axes" values MUST be repeated in the field "_ARRAY_DIMENSIONS" of all scale groups | ||
(i.e. groups containing arrays with the multiscale data). | ||
This ensures compatibility with the [xarray zarr encoding](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html#zarr-encoding). | ||
|
||
It SHOULD contain the field "name". | ||
|
||
It SHOULD contain the field "version", which indicates the version of the | ||
multiscale metadata of this image (current version is 0.3). | ||
|
||
It SHOULD contain the field "type", which gives the type of downscaling method used to generate the multiscale image pyramid. | ||
|
||
It SHOULD contain the field "metadata", which contains a dictionary with additional information about the downscaling method. | ||
|
||
```json | ||
{ | ||
"multiscales": [ | ||
{ | ||
"version": "0.3", | ||
"name": "example", | ||
"datasets": [ | ||
{"path": "0"}, | ||
{"path": "1"}, | ||
{"path": "2"} | ||
], | ||
"axes": [ | ||
"t", "c", "z", "y", "x" | ||
], | ||
Comment on lines
+248
to
+250
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do support n-dimensional, (OME 8D spec / OME n-D spec) then this simply expands to include the dimension shorthands. |
||
"type": "gaussian", | ||
"metadata": { # the fields in metadata depend on the downscaling implementation | ||
"method": "skimage.transform.pyramid_gaussian", # here, the paramters passed to the skimage function are given | ||
"version": "0.16.1", | ||
"args": "[true]", | ||
"kwargs": {"multichannel": true} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
If only one multiscale is provided, use it. Otherwise, the user can choose by | ||
name, using the first multiscale as a fallback: | ||
|
||
|
@@ -223,9 +272,6 @@ if not datasets: | |
datasets = [x["path"] for x in multiscales[0]["datasets"]] | ||
``` | ||
|
||
The subresolutions in each multiscale are ordered from highest-resolution | ||
to lowest. | ||
|
||
"omero" metadata {#omero-md} | ||
---------------------------- | ||
|
||
|
@@ -235,7 +281,7 @@ can be found under the "omero" key in the group-level metadata: | |
```json | ||
"id": 1, # ID in OMERO | ||
"name": "example.tif", # Name as shown in the UI | ||
"version": "0.1", # Current version | ||
"version": "0.3", # Current version | ||
"channels": [ # Array matching the c dimension size | ||
{ | ||
"active": true, | ||
|
@@ -312,7 +358,7 @@ above). | |
```json | ||
"image-label": | ||
{ | ||
"version": "0.1", | ||
"version": "0.3", | ||
"colors": [ | ||
{ | ||
"label-value": 1, | ||
|
@@ -424,7 +470,7 @@ For example the following JSON object defines a plate with two acquisition and | |
"name": "B" | ||
} | ||
], | ||
"version": "0.1", | ||
"version": "0.3", | ||
"wells": [ | ||
{ | ||
"path": "2020-10-10/A/1" | ||
|
@@ -491,7 +537,7 @@ the last two fields of view were part of the second acquisition. | |
"path": "3" | ||
} | ||
], | ||
"version": "0.1" | ||
"version": "0.3" | ||
} | ||
``` | ||
|
||
|
@@ -534,9 +580,9 @@ Note: If you would like to see your project listed, please open an issue or PR o | |
Citing {#citing} | ||
================ | ||
|
||
[Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.](https://ngff.openmicroscopy.org/0.1) | ||
[Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.](https://ngff.openmicroscopy.org/0.3) | ||
J. Moore, *et al*. Editors. Open Microscopy Environment Consortium, 20 November 2020. | ||
This edition of the specification is [https://ngff.openmicroscopy.org/0.1/](https://ngff.openmicroscopy.org/0.1/]). | ||
This edition of the specification is [https://ngff.openmicroscopy.org/0.3/](https://ngff.openmicroscopy.org/0.3/]). | ||
The latest edition is available at [https://ngff.openmicroscopy.org/latest/](https://ngff.openmicroscopy.org/latest/). | ||
[(doi:10.5281/zenodo.4282107)](https://doi.org/10.5281/zenodo.4282107) | ||
|
||
|
@@ -551,6 +597,11 @@ Version History {#history} | |
<td>Description</td> | ||
</tr> | ||
</thead> | ||
<tr> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Plus add in your own changelog. |
||
<td>0.2.0</td> | ||
<td>2021-03-29</td> | ||
<td>Change chunk dimension separator to "/" </td> | ||
</tr> | ||
<tr> | ||
<td>0.1.4</td> | ||
<td>2020-11-26</td> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you likely want to revert these top-matter items