-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Dask/pyramid suport. Split tiff/imagej/ome reader at metadata level instead. #22
Conversation
TODO: processing ome metadata
tifffile_reader, | ||
zip_reader) | ||
import pytest | ||
import tifffile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the imagej_reader from tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, you're trying to just unify everything under one reader that automagically does the right thing. That's fine, but I think it's worth keeping and testing the other readers as independent functions. This makes it easier to understand when one particular functionality is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes imagej_reader entirely, and instead has separate functions for finding imagej/ome/tiff metadata (instead of having separate functions for reading each of those types of tiff files).
EDIT: sorry Juan, for some reason your second comment didn't show up for me. I didn't mean to post redundant information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it also saves imagej metadata - I didn't know that was supported. I'll put back the test.
napari_tiff/napari_tiff_reader.py
Outdated
if 'OME' in metadata: | ||
metadata = metadata['OME'] | ||
|
||
# TODO: process ome metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth depending on @tlambert03's ome-types for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood this is mainly used for formatting/writing as opposed to reading. We have a dictionary at this point so not sure what including the ome-types package would add here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ome-types does both, and inasmuch as you're only going to be pulling a couple things like channel names, colors, and scales from the metadata, I agree that it's probably not necessary to bring on an additional dependency just to grab a few things.
you will, however, likely eventually run into malformed xml that doesn't strictly meet the ome data model. And in that case you'll just want to make sure that you essentially assume very little about the structure of the dict you get back (and fail gracefully). ome-types tests itself against a ton of xml examples and might do slightly better in that regard, and the object you get back will have a guaranteed structure (and fixes a handful of commonly seen errors in various ome xml implementations, like micro-manager).
I'd say go with what tifffile provides for now, and if you get any bug reports with poorly-mined metadata, then try out ome-types on those files and see if it makes things any more robust
@@ -67,7 +66,6 @@ def test_reader(tmp_path, data_fixture, original_data): | |||
|
|||
@pytest.mark.parametrize("reader, data_fixture, original_data", [ | |||
(imagecodecs_reader, example_data_filepath, np.random.random((20, 20))), | |||
(imagej_reader, example_data_tiff, np.random.randint(0, 255, size=(20, 20)).astype(np.uint8)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important that we don't lose functionality, so we'll need to keep this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...ah, I think I see what's happening with this restructure now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
@@ -29,7 +29,8 @@ requires-python = '>=3.10' | |||
dependencies = [ | |||
'imagecodecs', | |||
'numpy', | |||
'tifffile>=2020.5.7', | |||
'tifffile>=2023.9.26', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific feature/requirement causing the bump in version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, loading (and lazy writing) levels / zarr / dask
Could this be a possible failure case? https://forum.image.sc/t/tifffile-opening-individual-ome-tiff-files-as-single-huge-array-even-when-isolated/77701 @jni do you happen to remember or have a link to those files, so we can check? |
napari_tiff/napari_tiff_reader.py
Outdated
if nlevels > 1: | ||
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)] | ||
else: | ||
data = da.from_zarr(tif.aszarr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure how I feel about making both dask and zarr hard requirements for opening any tiff, even the non-pyramidal ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what allows lazy loading & pyramid sizes.
Although this works regardless, I'll split it up and put back asarray() for single series.
napari_tiff/napari_tiff_reader.py
Outdated
elif tif.is_imagej: | ||
kwargs = get_imagej_metadata(tif) | ||
else: | ||
kwargs = get_tiff_metadata(tif) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected that tiff files might include some combination of these types of metadata, not only one. Would it be better to combine the dictionaries instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's also possible. This is exactly what I'm doing in our image conversion pipeline e.g.:
napari_tiff/napari_tiff_reader.py
Outdated
# TODO: process ome metadata | ||
|
||
kwargs = dict( | ||
rgb=rgb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these variables are not defined in this function? (rgb
, channel_axis
, name
, scale
, colormap
, contrast_limits
, blending
, visible
, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP
def get_ome_tiff_metadata(tif): | ||
metadata = xml2dict(tif.ome_metadata) | ||
if 'OME' in metadata: | ||
metadata = metadata['OME'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata
variable doesn't seem to get used in this function after this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP
napari_tiff/napari_tiff_reader.py
Outdated
@@ -72,14 +70,30 @@ def zip_reader(path: PathLike) -> List[LayerData]: | |||
|
|||
|
|||
def tifffile_reader(tif): | |||
"""Return napari LayerData from largest image series in TIFF file.""" | |||
"""Return napari LayerData from image series in TIFF file.""" | |||
nlevels = len(tif.series[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little confused about the difference between tiff series and tiff pages. Is anyone else able to explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although @cgohlke is the expert as creator of tifffile, as I understand it pages is the internal structure of tiff, and the series/levels provide higher level access (which point to the underlying pages).
napari_tiff/napari_tiff_reader.py
Outdated
"""Return napari LayerData from image series in TIFF file.""" | ||
nlevels = len(tif.series[0]) | ||
if nlevels > 1: | ||
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the tiff series are arranged in a pyramid format. Is that something that is absolutely guaranteed by the tiff specification?
If not, we may need to check we actually have a multi-resolution file with all the resolution levels in the right order before passing it to napari.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not seen this written otherwise in any WSI file format. I imagine it's spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctring from series Series of pages with compatible shape and data type.
So if the length of series is different from 1 then different shapes are present.
However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Czaki this is testing the length / number of levels inside the first serie. Would a thumbnail be the same or a separate series? What would the best way be to test for pyramid - ome metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know. I work with writing my own tiff parser a long time ago and cannot find the data that I used then. This is rather point that it will be nice to check this (for example using some public datasets). I will also try to digg it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised this should be: len(tif.series[0].levels)
Minor: avoid variables with same name as functions (example_data_filepath)
Yes it could.
Here's the data, for manual testing, not broad distribution: https://www.dropbox.com/scl/fo/rqopeqntdsgzep4r10qbs/h?rlkey=k931yg05j4sbnqlh60z343k74&dl=0 |
Having said this, since @folterj is using dask/zarr and not imread, it probably would work but would return empty slices where the data is missing. And that is arguably the correct behaviour here. |
'vispy', | ||
'zarr', | ||
] | ||
|
||
[project.optional-dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just below this line there are testing dependencies. Please add napari
there
|
||
def example_data_imagej(tmp_path, original_data): | ||
filepath = str(tmp_path / "myfile.tif") | ||
tifffile.imwrite(filepath, original_data, imagej=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is WIP? Or some metadata should be added? I know that it will contain some imagej tag but, will not contain any metadata to check if reading is proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. The original test code read plain tif as imagej, so trying to expand with some simple imagej (and ome-tiff for ome-tiff reading) metadata that tifffile supports.
I'm happy to leave if/what metadata the imagej part of the code would be tested against, as I'm focussing on the ome-tiff reading.
In general a concise, non-generated dataset is needed for more thorough testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark is to ome function. It is also without providing any metadata.
napari_tiff/napari_tiff_reader.py
Outdated
|
||
|
||
def get_value_units_micrometer(value: float, unit: str = None) -> float: | ||
conversions = {'nm': 1e-3, 'µm': 1, 'um': 1, 'micrometer': 1, 'mm': 1e3, 'cm': 1e4, 'm': 1e6} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should be global variable
napari_tiff/napari_tiff_reader.py
Outdated
"""Return napari LayerData from image series in TIFF file.""" | ||
nlevels = len(tif.series[0]) | ||
if nlevels > 1: | ||
data = [da.from_zarr(tif.aszarr(level=level)) for level in range(nlevels)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctring from series Series of pages with compatible shape and data type.
So if the length of series is different from 1 then different shapes are present.
However, I'm not sure if it always should be read as pyramidal. As I know that tiff allows storing thumbnail for example. But I do not have such data available.
Superseded by #24 |
WIP, TODO: processing ome metadata