Added Dask/pyramid suport. Split tiff/imagej/ome reader at metadata level instead. #22

folterj · 2024-02-25T08:09:51Z

WIP, TODO: processing ome metadata

TODO: processing ome metadata

jni · 2024-02-25T23:48:36Z

napari_tiff/_tests/test_tiff_reader.py

                                            tifffile_reader,
                                            zip_reader)
-import pytest
-import tifffile


Why did you remove the imagej_reader from tests?

Oh, I see, you're trying to just unify everything under one reader that automagically does the right thing. That's fine, but I think it's worth keeping and testing the other readers as independent functions. This makes it easier to understand when one particular functionality is broken.

This PR removes imagej_reader entirely, and instead has separate functions for finding imagej/ome/tiff metadata (instead of having separate functions for reading each of those types of tiff files).

EDIT: sorry Juan, for some reason your second comment didn't show up for me. I didn't mean to post redundant information.

Ok it also saves imagej metadata - I didn't know that was supported. I'll put back the test.

jni · 2024-02-26T00:12:36Z

napari_tiff/napari_tiff_reader.py

+    if 'OME' in metadata:
+        metadata = metadata['OME']
+
+    # TODO: process ome metadata


It might be worth depending on @tlambert03's ome-types for this.

I understood this is mainly used for formatting/writing as opposed to reading. We have a dictionary at this point so not sure what including the ome-types package would add here?

ome-types does both, and inasmuch as you're only going to be pulling a couple things like channel names, colors, and scales from the metadata, I agree that it's probably not necessary to bring on an additional dependency just to grab a few things.

you will, however, likely eventually run into malformed xml that doesn't strictly meet the ome data model. And in that case you'll just want to make sure that you essentially assume very little about the structure of the dict you get back (and fail gracefully). ome-types tests itself against a ton of xml examples and might do slightly better in that regard, and the object you get back will have a guaranteed structure (and fixes a handful of commonly seen errors in various ome xml implementations, like micro-manager).

I'd say go with what tifffile provides for now, and if you get any bug reports with poorly-mined metadata, then try out ome-types on those files and see if it makes things any more robust

GenevieveBuckley · 2024-02-26T00:36:14Z

napari_tiff/_tests/test_tiff_reader.py

@@ -67,7 +66,6 @@ def test_reader(tmp_path, data_fixture, original_data):

 @pytest.mark.parametrize("reader, data_fixture, original_data", [
    (imagecodecs_reader, example_data_filepath, np.random.random((20, 20))),
-    (imagej_reader, example_data_tiff,  np.random.randint(0, 255, size=(20, 20)).astype(np.uint8)),


It's important that we don't lose functionality, so we'll need to keep this test.

...ah, I think I see what's happening with this restructure now

GenevieveBuckley · 2024-02-26T00:44:10Z

pyproject.toml

@@ -29,7 +29,8 @@ requires-python = '>=3.10'
 dependencies = [
   'imagecodecs',
   'numpy',
-   'tifffile>=2020.5.7',
+   'tifffile>=2023.9.26',


Is there a specific feature/requirement causing the bump in version number?

Yes, loading (and lazy writing) levels / zarr / dask

GenevieveBuckley · 2024-02-26T05:13:58Z

Could this be a possible failure case? https://forum.image.sc/t/tifffile-opening-individual-ome-tiff-files-as-single-huge-array-even-when-isolated/77701

@jni do you happen to remember or have a link to those files, so we can check?

GenevieveBuckley · 2024-02-26T06:36:57Z

napari_tiff/napari_tiff_reader.py

+    if nlevels > 1:
+        data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]
+    else:
+        data = da.from_zarr(tif.aszarr())


I'm not entirely sure how I feel about making both dask and zarr hard requirements for opening any tiff, even the non-pyramidal ones

This is what allows lazy loading & pyramid sizes.
Although this works regardless, I'll split it up and put back asarray() for single series.

GenevieveBuckley · 2024-02-26T06:38:42Z

napari_tiff/napari_tiff_reader.py

+    elif tif.is_imagej:
+        kwargs = get_imagej_metadata(tif)
+    else:
+        kwargs = get_tiff_metadata(tif)


I expected that tiff files might include some combination of these types of metadata, not only one. Would it be better to combine the dictionaries instead?

Yes that's also possible. This is exactly what I'm doing in our image conversion pipeline e.g.:

https://github.com/FrancisCrickInstitute/OmeSliCC/blob/f6e2aecd5713bbbff7e10e04eebe5ede4bace1cc/OmeSliCC/TiffSource.py#L113

GenevieveBuckley · 2024-02-26T06:40:53Z

napari_tiff/napari_tiff_reader.py

+    # TODO: process ome metadata
+
+    kwargs = dict(
+        rgb=rgb,


It looks like these variables are not defined in this function? (rgb, channel_axis, name, scale, colormap, contrast_limits, blending, visible, etc.)

GenevieveBuckley · 2024-02-26T06:41:56Z

napari_tiff/napari_tiff_reader.py

+def get_ome_tiff_metadata(tif):
+    metadata = xml2dict(tif.ome_metadata)
+    if 'OME' in metadata:
+        metadata = metadata['OME']


The metadata variable doesn't seem to get used in this function after this?

GenevieveBuckley · 2024-02-26T06:46:08Z

napari_tiff/napari_tiff_reader.py

@@ -72,14 +70,30 @@ def zip_reader(path: PathLike) -> List[LayerData]:


 def tifffile_reader(tif):
-    """Return napari LayerData from largest image series in TIFF file."""
+    """Return napari LayerData from image series in TIFF file."""
+    nlevels = len(tif.series[0])


I'm still a little confused about the difference between tiff series and tiff pages. Is anyone else able to explain it?

Although @cgohlke is the expert as creator of tifffile, as I understand it pages is the internal structure of tiff, and the series/levels provide higher level access (which point to the underlying pages).

GenevieveBuckley · 2024-02-26T06:49:49Z

napari_tiff/napari_tiff_reader.py

+    """Return napari LayerData from image series in TIFF file."""
+    nlevels = len(tif.series[0])
+    if nlevels > 1:
+        data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]


This assumes that the tiff series are arranged in a pyramid format. Is that something that is absolutely guaranteed by the tiff specification?
If not, we may need to check we actually have a multi-resolution file with all the resolution levels in the right order before passing it to napari.

I've not seen this written otherwise in any WSI file format. I imagine it's spec.

Doctring from series Series of pages with compatible shape and data type. So if the length of series is different from 1 then different shapes are present.

However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.

@Czaki this is testing the length / number of levels inside the first serie. Would a thumbnail be the same or a separate series? What would the best way be to test for pyramid - ome metadata?

I do not know. I work with writing my own tiff parser a long time ago and cannot find the data that I used then. This is rather point that it will be nice to check this (for example using some public datasets). I will also try to digg it later.

I just realised this should be: len(tif.series[0].levels)

Minor: avoid variables with same name as functions (example_data_filepath)

jni · 2024-02-29T02:11:40Z

Could this be a possible failure case? forum.image.sc/t/tifffile-opening-individual-ome-tiff-files-as-single-huge-array-even-when-isolated/77701

Yes it could.

@jni do you happen to remember or have a link to those files, so we can check?

Here's the data, for manual testing, not broad distribution:

https://www.dropbox.com/scl/fo/rqopeqntdsgzep4r10qbs/h?rlkey=k931yg05j4sbnqlh60z343k74&dl=0

jni · 2024-02-29T02:12:57Z

Having said this, since @folterj is using dask/zarr and not imread, it probably would work but would return empty slices where the data is missing. And that is arguably the correct behaviour here.

Czaki · 2024-03-05T08:48:17Z

pyproject.toml

   'vispy',
+   'zarr',
 ]

 [project.optional-dependencies]


Just below this line there are testing dependencies. Please add napari there

Czaki · 2024-03-05T08:53:55Z

napari_tiff/_tests/test_tiff_reader.py

+
+def example_data_imagej(tmp_path, original_data):
+    filepath = str(tmp_path / "myfile.tif")
+    tifffile.imwrite(filepath, original_data, imagej=True)


This is WIP? Or some metadata should be added? I know that it will contain some imagej tag but, will not contain any metadata to check if reading is proper.

This is a good point. The original test code read plain tif as imagej, so trying to expand with some simple imagej (and ome-tiff for ome-tiff reading) metadata that tifffile supports.
I'm happy to leave if/what metadata the imagej part of the code would be tested against, as I'm focussing on the ome-tiff reading.
In general a concise, non-generated dataset is needed for more thorough testing.

Same remark is to ome function. It is also without providing any metadata.

Czaki · 2024-03-05T08:55:38Z

napari_tiff/napari_tiff_reader.py

+
+
+def get_value_units_micrometer(value: float, unit: str = None) -> float:
+    conversions = {'nm': 1e-3, 'µm': 1, 'um': 1, 'micrometer': 1, 'mm': 1e3, 'cm': 1e4, 'm': 1e6}


I think that this should be global variable

Czaki · 2024-03-05T09:14:25Z

napari_tiff/napari_tiff_reader.py

+    """Return napari LayerData from image series in TIFF file."""
+    nlevels = len(tif.series[0])
+    if nlevels > 1:
+        data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]


Doctring from series Series of pages with compatible shape and data type. So if the length of series is different from 1 then different shapes are present.

However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.

GenevieveBuckley · 2024-03-26T06:09:17Z

Superseded by #24

Split tiff/imagej/ome reader at metadata level instead

d1290fd

TODO: processing ome metadata

jni reviewed Feb 25, 2024

View reviewed changes

jni reviewed Feb 26, 2024

View reviewed changes

GenevieveBuckley reviewed Feb 26, 2024

View reviewed changes

Add zarr to dependencies

fecc64c

GenevieveBuckley reviewed Feb 26, 2024

View reviewed changes

folterj added 2 commits February 26, 2024 12:51

Restored imagej metadata test

da56519

Minor: avoid variables with same name as functions (example_data_filepath)

Implemented ome metadata processing

ab4f521

Czaki reviewed Mar 5, 2024

View reviewed changes

folterj added 5 commits March 5, 2024 13:31

Added metadata testing

ee9c216

Create split channel images, simplify testing

ccbc28f

ome-tiff: Improved data and metadata creation, extended testing

967d3e4

Moved napari tests from auto-tests

17af7eb

Testing test

616a7ab

GenevieveBuckley closed this Mar 26, 2024



		def get_value_units_micrometer(value: float, unit: str = None) -> float:
		conversions = {'nm': 1e-3, 'µm': 1, 'um': 1, 'micrometer': 1, 'mm': 1e3, 'cm': 1e4, 'm': 1e6}

Added Dask/pyramid suport. Split tiff/imagej/ome reader at metadata level instead. #22

Added Dask/pyramid suport. Split tiff/imagej/ome reader at metadata level instead. #22

Conversation

folterj commented Feb 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GenevieveBuckley Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GenevieveBuckley commented Feb 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jni commented Feb 29, 2024

jni commented Feb 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GenevieveBuckley commented Mar 26, 2024

GenevieveBuckley Feb 26, 2024 •

edited

Loading