Extend image metadata #18951

kostrykin · 2024-10-08T14:59:11Z

This PR adds a series of basic metadata elements for image data, including:

The width, height, depth of the image, as well as the number of channels and frames (the terminology is consistent with Add support for arbitrarily ordered image axes in image content assertions #18891)
The axes of the image (e.g., YXC or ZCYX)
dtype: The data type of the image pixels or voxels (e.g., uint8 or float64)
num_unique_values: The number of unique values in the image

This is useful to define validators for input data when working with images. Some examples of when this will be useful:

Require that an image is a binary image by validating that num_unique_values is 1 or 2.
Validate dtype: Some tools might not support float or int image data.
Validate that channels is 0 or 1: Restrict input data to single-channel images.
Validate axes, depth, channels, frames: Require that an image has one or more z-slices / channels / time steps.

TIFF files are read using the tifffile library, other image types are tried to be read using Pillow. The new metadata is defined as optional, because Pillow might not be installed, or it might not be possible to read an image using Pillow (e.g., due to an image format that Pillow does not support).

For multi-page TIFF files, the metadata is determined for each page individually, and then joined into a ,-separated string (with the order corresponding to the order of the pages in the series).

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

bernt-matthias · 2024-10-08T20:52:30Z

The problem with black will be solved here #18955

This is more consistent with the terminology used in https://github.com/cgohlke/tifffile/blob/8a25a0d4738390af0a1f693705f29875d88fc320/tifffile/tifffile.py#L4676

test/functional/tools/validation_image_metadata.xml

… case

bgruening · 2024-10-09T09:35:25Z

make format might help you

bernt-matthias · 2024-10-09T11:20:49Z

One thing that I was thinking about is resource usage (time and memory) for setting the metadata. Is this limited in the current implementation?

For many other data types we restrict the amount of data that we process (often we just read 1MB prefix) -- but I guess this is not useful here.

kostrykin · 2024-10-09T11:58:15Z

Another thing to consider is that, if a tool requires certain metadata to be there, and it's not there because the data had been uploaded into Galaxy too long ago. This issue probably arises whenever new metadata is added in Galaxy? Are there any established procedures to cope with that?

If not, two possible solutions come to my mind. Either the tool should also accept a dataset for which the metadata is missing. Or Galaxy should automatically recognize that the metadata of a dataset is outdated and rerun the respective metadata extraction methods.

The former seems to be more easily feasible. It essentially means that a validator of the form <validator type="dataset_metadata_equal" metadata_name="key" value="val" /> should not only be successful if the metadata element key is set to val, but also if key is set to "". This means that we would have to add an optional attribute to all validators of type dataset_metadata_*, that is false by default for backwards compatibility, but should actually always be set to true in future when "new" metadata is validated. Makes sense?

bernt-matthias · 2024-10-09T12:39:36Z

I think re-running metadata extraction is the way to go (users can trigger that -- if I'm not wrong).

With the metadata validator you can check if a specific metadata is set and add a message asking users to re-trigger setting metadata.

Adding an optional attribute to metadata validators might also be an option.

kostrykin · 2024-10-09T14:23:13Z

One thing that I was thinking about is resource usage (time and memory) for setting the metadata. Is this limited in the current implementation?

For many other data types we restrict the amount of data that we process (often we just read 1MB prefix) -- but I guess this is not useful here.

Definitely something to consider.

Given these considerations, I have made several changes in 1e6701f:

Pillow should now avoid loading the full image. Instead, most of the metadata is inferred directly from the PIL.Image object. I expect that .histogram() will still iterate over all pixel values, however, this could be more efficient than loading the full image data into memory (the documentation isn't specific on this).
With tifffile, the width, height, depth, frames, channels are now inferred without loading the full image data.

kostrykin · 2024-10-10T08:35:05Z

There is something strange going on with the tests, they hang kind of randomly.

Most of the time, both running locally using run_tests.sh -framework -id validation_image_metadata and when running in CI, it hangs on Test 3: https://github.com/galaxyproject/galaxy/actions/runs/11256985091/job/31300182332?pr=18951#step:9:2930

However, after removing the first two tests, Test 3 passes (this is then the first test), and it hangs on Test 4 instead (this is then the second test).

bernt-matthias · 2024-10-10T08:39:36Z

Does it work locally with run_tests.sh? Have you tried planemo test?

kostrykin · 2024-10-10T08:56:05Z

Does it work locally with run_tests.sh? Have you tried planemo test?

Nope, same behavior using run_tests.sh -framework -id validation_image_metadata locally. If I remove all tests except the last one, it hangs on the last test. More specifically, the execution reaches this line but doesn't go beyond:

with tifffile.TiffFile(dataset.get_file_name()) as tif:

(note that this line was already there before this PR)

Haven't tried planemo test yet.

kostrykin · 2024-10-10T09:20:24Z

Does it work locally with run_tests.sh? Have you tried planemo test?

Nope, same behavior using run_tests.sh -framework -id validation_image_metadata locally. If I remove all tests except the last one, it hangs on the last test. More specifically, the execution reaches this line but doesn't go beyond:
with tifffile.TiffFile(dataset.get_file_name()) as tif:

The issue with the last test is fixed in f3e20d9. Actually this should be totally unrelated to the other tests, but somehow this also fixes their hanging (running both locally and in CI). Maybe a bug in the test execution?

kostrykin added 7 commits October 8, 2024 15:05

Add tests for axes metadata

a65b72f

Reduce boilerplate code in tests

8c9ba38

Add dtype metadata and tests

29dc831

Add num_unique_values metadata and tests

d350cca

Add width and height metadata and tests

7fa6796

Add channels metadata and tests

2414039

Add depth and frames metadata and tests

ac29d17

github-actions bot added area/testing area/datatypes labels Oct 8, 2024

github-actions bot added this to the 24.2 milestone Oct 8, 2024

Fix mypy check

4386580

This comment was marked as resolved.

Sign in to view

kostrykin marked this pull request as draft October 8, 2024 20:01

kostrykin added 2 commits October 8, 2024 22:27

Fix support for TIFF files with unsupported compression formats

e030464

Fix black linting

de7b37b

kostrykin added 3 commits October 8, 2024 23:44

Add support for TIFF files with multiple series

f9251ee

Fix black linting

119dc78

Add type hint for mypy

669e3db

kostrykin marked this pull request as ready for review October 8, 2024 21:54

kostrykin marked this pull request as draft October 9, 2024 05:49

kostrykin added 2 commits October 9, 2024 07:51

Fix tests

0f7f64c

Rename series -> page

48b1808

This is more consistent with the terminology used in https://github.com/cgohlke/tifffile/blob/8a25a0d4738390af0a1f693705f29875d88fc320/tifffile/tifffile.py#L4676

bgruening reviewed Oct 9, 2024

View reviewed changes

test/functional/tools/validation_image_metadata.xml Show resolved Hide resolved

kostrykin added 6 commits October 9, 2024 08:45

Add test for empty TIFF file (no metadata available)

6117398

Add test for corrupted TIFF file and fix metadata extraction for that…

685f653

… case

Fix linting

39a726b

Fix linting

4510f27

Fix linting

b199061

Fix linting

f2c874f

kostrykin marked this pull request as ready for review October 9, 2024 11:48

kostrykin added 2 commits October 9, 2024 16:09

Reduce utilization of full image data

1e6701f

make format

6459225

kostrykin marked this pull request as draft October 10, 2024 08:53

Fix for corrupted TIF images

f3e20d9

kostrykin marked this pull request as ready for review October 10, 2024 09:47

Merge remote-tracking branch 'upstream/dev' into image-metadata/dev

8ea2ef3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend image metadata #18951

Extend image metadata #18951

kostrykin commented Oct 8, 2024 •

edited

Loading

This comment was marked as resolved.

bernt-matthias commented Oct 8, 2024 •

edited

Loading

bgruening commented Oct 9, 2024

bernt-matthias commented Oct 9, 2024

kostrykin commented Oct 9, 2024 •

edited

Loading

bernt-matthias commented Oct 9, 2024

kostrykin commented Oct 9, 2024 •

edited

Loading

kostrykin commented Oct 10, 2024 •

edited

Loading

bernt-matthias commented Oct 10, 2024

kostrykin commented Oct 10, 2024 •

edited

Loading

kostrykin commented Oct 10, 2024 •

edited

Loading

Extend image metadata #18951

Are you sure you want to change the base?

Extend image metadata #18951

Conversation

kostrykin commented Oct 8, 2024 • edited Loading

How to test the changes?

License

This comment was marked as resolved.

bernt-matthias commented Oct 8, 2024 • edited Loading

bgruening commented Oct 9, 2024

bernt-matthias commented Oct 9, 2024

kostrykin commented Oct 9, 2024 • edited Loading

bernt-matthias commented Oct 9, 2024

kostrykin commented Oct 9, 2024 • edited Loading

kostrykin commented Oct 10, 2024 • edited Loading

bernt-matthias commented Oct 10, 2024

kostrykin commented Oct 10, 2024 • edited Loading

kostrykin commented Oct 10, 2024 • edited Loading

kostrykin commented Oct 8, 2024 •

edited

Loading

bernt-matthias commented Oct 8, 2024 •

edited

Loading

kostrykin commented Oct 9, 2024 •

edited

Loading

kostrykin commented Oct 9, 2024 •

edited

Loading

kostrykin commented Oct 10, 2024 •

edited

Loading

kostrykin commented Oct 10, 2024 •

edited

Loading

kostrykin commented Oct 10, 2024 •

edited

Loading