Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dask/pyramid suport. Split tiff/imagej/ome reader at metadata level instead. #22

Closed
wants to merge 9 commits into from

Conversation

folterj
Copy link
Contributor

@folterj folterj commented Feb 25, 2024

WIP, TODO: processing ome metadata

tifffile_reader,
zip_reader)
import pytest
import tifffile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the imagej_reader from tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, you're trying to just unify everything under one reader that automagically does the right thing. That's fine, but I think it's worth keeping and testing the other readers as independent functions. This makes it easier to understand when one particular functionality is broken.

Copy link
Collaborator

@GenevieveBuckley GenevieveBuckley Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes imagej_reader entirely, and instead has separate functions for finding imagej/ome/tiff metadata (instead of having separate functions for reading each of those types of tiff files).

EDIT: sorry Juan, for some reason your second comment didn't show up for me. I didn't mean to post redundant information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it also saves imagej metadata - I didn't know that was supported. I'll put back the test.

if 'OME' in metadata:
metadata = metadata['OME']

# TODO: process ome metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth depending on @tlambert03's ome-types for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood this is mainly used for formatting/writing as opposed to reading. We have a dictionary at this point so not sure what including the ome-types package would add here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ome-types does both, and inasmuch as you're only going to be pulling a couple things like channel names, colors, and scales from the metadata, I agree that it's probably not necessary to bring on an additional dependency just to grab a few things.

you will, however, likely eventually run into malformed xml that doesn't strictly meet the ome data model. And in that case you'll just want to make sure that you essentially assume very little about the structure of the dict you get back (and fail gracefully). ome-types tests itself against a ton of xml examples and might do slightly better in that regard, and the object you get back will have a guaranteed structure (and fixes a handful of commonly seen errors in various ome xml implementations, like micro-manager).

I'd say go with what tifffile provides for now, and if you get any bug reports with poorly-mined metadata, then try out ome-types on those files and see if it makes things any more robust

@@ -67,7 +66,6 @@ def test_reader(tmp_path, data_fixture, original_data):

@pytest.mark.parametrize("reader, data_fixture, original_data", [
(imagecodecs_reader, example_data_filepath, np.random.random((20, 20))),
(imagej_reader, example_data_tiff, np.random.randint(0, 255, size=(20, 20)).astype(np.uint8)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important that we don't lose functionality, so we'll need to keep this test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...ah, I think I see what's happening with this restructure now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@@ -29,7 +29,8 @@ requires-python = '>=3.10'
dependencies = [
'imagecodecs',
'numpy',
'tifffile>=2020.5.7',
'tifffile>=2023.9.26',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific feature/requirement causing the bump in version number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, loading (and lazy writing) levels / zarr / dask

@GenevieveBuckley
Copy link
Collaborator

Could this be a possible failure case? https://forum.image.sc/t/tifffile-opening-individual-ome-tiff-files-as-single-huge-array-even-when-isolated/77701

@jni do you happen to remember or have a link to those files, so we can check?

if nlevels > 1:
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]
else:
data = da.from_zarr(tif.aszarr())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how I feel about making both dask and zarr hard requirements for opening any tiff, even the non-pyramidal ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what allows lazy loading & pyramid sizes.
Although this works regardless, I'll split it up and put back asarray() for single series.

elif tif.is_imagej:
kwargs = get_imagej_metadata(tif)
else:
kwargs = get_tiff_metadata(tif)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that tiff files might include some combination of these types of metadata, not only one. Would it be better to combine the dictionaries instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's also possible. This is exactly what I'm doing in our image conversion pipeline e.g.:

https://github.com/FrancisCrickInstitute/OmeSliCC/blob/f6e2aecd5713bbbff7e10e04eebe5ede4bace1cc/OmeSliCC/TiffSource.py#L113

# TODO: process ome metadata

kwargs = dict(
rgb=rgb,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these variables are not defined in this function? (rgb, channel_axis, name, scale, colormap, contrast_limits, blending, visible, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

def get_ome_tiff_metadata(tif):
metadata = xml2dict(tif.ome_metadata)
if 'OME' in metadata:
metadata = metadata['OME']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata variable doesn't seem to get used in this function after this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

@@ -72,14 +70,30 @@ def zip_reader(path: PathLike) -> List[LayerData]:


def tifffile_reader(tif):
"""Return napari LayerData from largest image series in TIFF file."""
"""Return napari LayerData from image series in TIFF file."""
nlevels = len(tif.series[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little confused about the difference between tiff series and tiff pages. Is anyone else able to explain it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although @cgohlke is the expert as creator of tifffile, as I understand it pages is the internal structure of tiff, and the series/levels provide higher level access (which point to the underlying pages).

"""Return napari LayerData from image series in TIFF file."""
nlevels = len(tif.series[0])
if nlevels > 1:
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the tiff series are arranged in a pyramid format. Is that something that is absolutely guaranteed by the tiff specification?
If not, we may need to check we actually have a multi-resolution file with all the resolution levels in the right order before passing it to napari.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not seen this written otherwise in any WSI file format. I imagine it's spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctring from series Series of pages with compatible shape and data type. So if the length of series is different from 1 then different shapes are present.

However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Czaki this is testing the length / number of levels inside the first serie. Would a thumbnail be the same or a separate series? What would the best way be to test for pyramid - ome metadata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know. I work with writing my own tiff parser a long time ago and cannot find the data that I used then. This is rather point that it will be nice to check this (for example using some public datasets). I will also try to digg it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised this should be: len(tif.series[0].levels)

Minor: avoid variables with same name as functions (example_data_filepath)
@jni
Copy link
Member

jni commented Feb 29, 2024

Could this be a possible failure case? forum.image.sc/t/tifffile-opening-individual-ome-tiff-files-as-single-huge-array-even-when-isolated/77701

Yes it could.

@jni do you happen to remember or have a link to those files, so we can check?

Here's the data, for manual testing, not broad distribution:

https://www.dropbox.com/scl/fo/rqopeqntdsgzep4r10qbs/h?rlkey=k931yg05j4sbnqlh60z343k74&dl=0

@jni
Copy link
Member

jni commented Feb 29, 2024

Having said this, since @folterj is using dask/zarr and not imread, it probably would work but would return empty slices where the data is missing. And that is arguably the correct behaviour here.

'vispy',
'zarr',
]

[project.optional-dependencies]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just below this line there are testing dependencies. Please add napari there


def example_data_imagej(tmp_path, original_data):
filepath = str(tmp_path / "myfile.tif")
tifffile.imwrite(filepath, original_data, imagej=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is WIP? Or some metadata should be added? I know that it will contain some imagej tag but, will not contain any metadata to check if reading is proper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. The original test code read plain tif as imagej, so trying to expand with some simple imagej (and ome-tiff for ome-tiff reading) metadata that tifffile supports.
I'm happy to leave if/what metadata the imagej part of the code would be tested against, as I'm focussing on the ome-tiff reading.
In general a concise, non-generated dataset is needed for more thorough testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark is to ome function. It is also without providing any metadata.



def get_value_units_micrometer(value: float, unit: str = None) -> float:
conversions = {'nm': 1e-3, 'µm': 1, 'um': 1, 'micrometer': 1, 'mm': 1e3, 'cm': 1e4, 'm': 1e6}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should be global variable

"""Return napari LayerData from image series in TIFF file."""
nlevels = len(tif.series[0])
if nlevels > 1:
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctring from series Series of pages with compatible shape and data type. So if the length of series is different from 1 then different shapes are present.

However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.

@GenevieveBuckley
Copy link
Collaborator

Superseded by #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants