Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add axes field to multiscale metadata #46

Merged
merged 11 commits into from
Aug 24, 2021
95 changes: 73 additions & 22 deletions latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
Title: Next-generation file formats (NGFF)
Shortname: ome-ngff
Level: 1
Status: LS-COMMIT
Status: w3c/ED
Status: w3c/CG-FINAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you likely want to revert these top-matter items

Group: ome
URL: https://ngff.openmicroscopy.org/latest/
Repository: https://github.com/ome/ngff
Expand All @@ -18,12 +17,12 @@ Editor: Sébastien Besson, Open Microscopy Environment (OME) https://www.openmic
Abstract: This document contains next-generation file format (NGFF)
Abstract: specifications for storing bioimaging data in the cloud.
Abstract: All specifications are submitted to the https://image.sc community for review.
Status Text: The current released version of this specification is
Status Text: <a href="../0.1/index.html">0.1</a>. Migration scripts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text should stay as is but point to the 0.3 that you will create when latest/index.bs is done.

Status Text: will be provided between numbered versions. Data written with these latest changes
Status Text: This is the 0.3 release of this specification. Migration scripts
Status Text: will be provided between numbered versions. Data written with the latest version
Status Text: (an "editor's draft") will not necessarily be supported.
</pre>


Introduction {#intro}
=====================

Expand Down Expand Up @@ -117,20 +116,22 @@ multiple levels of resolutions and optionally associated labels.
├── .zgroup # Each image is a Zarr group, or a folder, of other groups and arrays.
├── .zattrs # Group level attributes are stored in the .zattrs file and include
│ # "multiscales" and "omero" below)
│ # "multiscales" and "omero" (see below). In addition, the group level attributes
│ # must also contain "_ARRAY_DIMENSIONS" if this group directly contains multi-scale arrays.
├── 0 # Each multiscale level is stored as a separate Zarr array,
│ ... # which is a folder containing chunk files which compose the array.
├── n # The name of the array is arbitrary with the ordering defined by
│ │ # by the "multiscales" metadata, but is often a sequence starting at 0.
│ │
│ ├── .zarray # All image arrays are 5-dimensional
│ ├── .zarray # All image arrays must be up to 5-dimensional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore shouldn't this be:

must be 5+ dimensional? OME-Tiff has the 8D / n-D spec and it would be a shame to lose that 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to allow for less(!) than 5D, so that it's not necessary to introduce singleton dimensions for 2D/3D/4D data.
There is an ongoing discussion about allowing for more than 5D, see #35. But that is out of scope for the current PR.

│ │ # with dimension order (t, c, z, y, x).
│ │
│ ├── 0.0.0.0.0 # Chunks are stored with the flat directory layout.
│ │ ... # Each dotted component of the chunk file represents
│ └── t.c.z.y.x # a "chunk coordinate", where the maximum coordinate
│ # will be `dimension_size / chunk_size`.
│ └─ t # Chunks are stored with the nested directory layout.
│ └─ c # All but the last chunk element are stored as directories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick with a flat directory? The motivation is compatibility with xarray and avoiding unnecessary overhead with IPFS 👾

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thewtex. Yes & no.

Short-term

The switch in v0.2 was to let us start actually creating these datasets before all implementations start supporting zarr-developers/zarr-python#715, i.e. ome-zarr-* implementations for v0.2 can "blindly" assert the use of "/" as the dimension separator. Python implementations should be in the (enviable) position that it doesn't matter. If you put the metadata in place, it should work for either assuming your zarr-python version is >=2.8.0 (note: ongoing bug fixes) Does that track with what you are seeing or am I missing something on the xarray front?

Longer-term

Once all of this has propagated through the ecosystem I think this spec shouldn't really care about which array layout is used. So a v0.3 or later can drop the requirement all together. That, however, does not get you want you want in the general case with IPFS. Do you foresee it really being that bad? (I ask because I know that the flat directory is unusable at some scales.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore ah, thanks for the information! I was not aware of zarr-developers/zarr-python#715 .

On the xarray front, they are not tied to a specific zarr spec, as far as I know, so it should cause an issue. But, it seems that we should do the same here, i.e. both layouts, . or /, would be acceptable, as you suggest, correct?

I ask because I know that the flat directory is unusable at some scales.

I can see how this would be the case. In which store types do you experience this?

For IPFS, each entity is identified with a cryptographic-hash. Updating a chunk in a file means updating its identifier, but it also means updating the identifiers for all parent directories in the directory tree; so, deeply nested trees are not desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both layouts, . or /, would be acceptable, as you suggest, correct?

That's the plan.

In which store types do you experience this?

DirectoryStore & FSStore

so, deeply nested trees are not desirable.

Understood. Thanks for the info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is the best way to move forward here, @thewtex @joshmoore?
I would rather not change this now (especially since this change was done in a previous PR already and the changes here are just about syncing the different versions of the spec). Also, like @joshmoore said, not supporting hiearchical storage for the chunks can be a big issue for large datasets on the file-system.

So I would propose that we leave this as it is for now and eventually change the spec once most implementations are agnostic to the actual separator.

If really necessary and there are some implementations that require a flat hierarchy, this could then be added as an extra implementation specific requirement. (I.e. these implementations would not be able to read all ngff formats, but only a subset.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd propose we leave 0.2 as hierarchical, focus 0.3 on axes, and then we can try to remove the 0.2 restriction ASAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

│ └─ z # The terminal chunk is a file. Together the directory and file names
│ └─ y # provide the "chunk coordinate" (t, c, z, y, x), where the maximum coordinate
│ └─ x # will be `dimension_size / chunk_size`.
└── labels
Expand Down Expand Up @@ -207,8 +208,56 @@ keys as specified below for discovering certain types of data, especially images

Metadata about the multiple resolution representations of the image can be
found under the "multiscales" key in the group-level metadata.
The specification for the multiscale (i.e. "resolution") metadata is provided
in [zarr-specs#50](https://github.com/zarr-developers/zarr-specs/issues/50).

"multiscales" contains a list of dictionaries where each entry describes a multiscale image.

Each dictionary contained in the list MUST contain the field "datasets", which is a list of dictionaries describing
the arrays storing the individual resolution levels.
Each dictionary in "datasets" MUST contain the field "path", whose value contains the path to the array for this resolution relative
to the current zarr group. The "path"s MUST be ordered from largest (i.e. highest resolution) to smallest.

It MUST contain the field "axes", which is a list of dimension names of the axes.
The values MUST be unique and one of `{"t", "c", "z", "y", "x"}`.
The number of values MUST be the same as the number of dimensions of the arrays corresponding to this image.
In addition, the "axes" values MUST be repeated in the field "_ARRAY_DIMENSIONS" of all scale groups
(i.e. groups containing arrays with the multiscale data).
This ensures compatibility with the [xarray zarr encoding](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html#zarr-encoding).

It SHOULD contain the field "name".

It SHOULD contain the field "version", which indicates the version of the
multiscale metadata of this image (current version is 0.3).

It SHOULD contain the field "type", which gives the type of downscaling method used to generate the multiscale image pyramid.

It SHOULD contain the field "metadata", which contains a dictionary with additional information about the downscaling method.

```json
{
"multiscales": [
{
"version": "0.3",
"name": "example",
"datasets": [
{"path": "0"},
{"path": "1"},
{"path": "2"}
],
"axes": [
"t", "c", "z", "y", "x"
],
Comment on lines +248 to +250

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do support n-dimensional, (OME 8D spec / OME n-D spec) then this simply expands to include the dimension shorthands.

"type": "gaussian",
"metadata": { # the fields in metadata depend on the downscaling implementation
"method": "skimage.transform.pyramid_gaussian", # here, the paramters passed to the skimage function are given
"version": "0.16.1",
"args": "[true]",
"kwargs": {"multichannel": true}
}
}
]
}
```

If only one multiscale is provided, use it. Otherwise, the user can choose by
name, using the first multiscale as a fallback:

Expand All @@ -223,9 +272,6 @@ if not datasets:
datasets = [x["path"] for x in multiscales[0]["datasets"]]
```

The subresolutions in each multiscale are ordered from highest-resolution
to lowest.

"omero" metadata {#omero-md}
----------------------------

Expand All @@ -235,7 +281,7 @@ can be found under the "omero" key in the group-level metadata:
```json
"id": 1, # ID in OMERO
"name": "example.tif", # Name as shown in the UI
"version": "0.1", # Current version
"version": "0.3", # Current version
"channels": [ # Array matching the c dimension size
{
"active": true,
Expand Down Expand Up @@ -312,7 +358,7 @@ above).
```json
"image-label":
{
"version": "0.1",
"version": "0.3",
"colors": [
{
"label-value": 1,
Expand Down Expand Up @@ -424,7 +470,7 @@ For example the following JSON object defines a plate with two acquisition and
"name": "B"
}
],
"version": "0.1",
"version": "0.3",
"wells": [
{
"path": "2020-10-10/A/1"
Expand Down Expand Up @@ -491,7 +537,7 @@ the last two fields of view were part of the second acquisition.
"path": "3"
}
],
"version": "0.1"
"version": "0.3"
}
```

Expand Down Expand Up @@ -534,9 +580,9 @@ Note: If you would like to see your project listed, please open an issue or PR o
Citing {#citing}
================

[Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.](https://ngff.openmicroscopy.org/0.1)
[Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.](https://ngff.openmicroscopy.org/0.3)
J. Moore, *et al*. Editors. Open Microscopy Environment Consortium, 20 November 2020.
This edition of the specification is [https://ngff.openmicroscopy.org/0.1/](https://ngff.openmicroscopy.org/0.1/]).
This edition of the specification is [https://ngff.openmicroscopy.org/0.3/](https://ngff.openmicroscopy.org/0.3/]).
The latest edition is available at [https://ngff.openmicroscopy.org/latest/](https://ngff.openmicroscopy.org/latest/).
[(doi:10.5281/zenodo.4282107)](https://doi.org/10.5281/zenodo.4282107)

Expand All @@ -551,6 +597,11 @@ Version History {#history}
<td>Description</td>
</tr>
</thead>
<tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus add in your own changelog.

<td>0.2.0</td>
<td>2021-03-29</td>
<td>Change chunk dimension separator to "/" </td>
</tr>
<tr>
<td>0.1.4</td>
<td>2020-11-26</td>
Expand Down