Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize well organization in high-content screening: field of view => image #137

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jluethi
Copy link
Contributor

@jluethi jluethi commented Aug 29, 2022

I would like to suggest a change to the wording of the OME-NGFF HCS plate specification and add some recommendations about performance for visualization vs. structure of image pyramids per well. Specifically, I propose that we explicitly allow for whole wells being saved as a single image as part of the OME-NGFF spec. As a conclusion of this, the components of the wells would be images, not field of views (because the image could consist of multiple field of views stitched together already).

Motivation

We would like to use OME-Zarr files to store TB-sized multi-channel, 3D high content imaging data in the HCS format. We are building an open-source image processing pipeline to process data in HCS OME-Zarr called Fractal. One of the benefits of saving such large datasets in OME-Zarrs is the possibility of interactive image visualization, e.g. in the napari viewer. When we were testing the scalability of this approach to large HCS plates, we discovered issues with saving all the field of views of the microscope as separate field of views in each well of the OME-Zarr file.
We started the discussion about this topic here: ome/ome-zarr-py#200
The discussion on the approach of saving single images per well starts here in more detail: ome/ome-zarr-py#200 (comment)

To very briefly summarize it:
By saving many field of views (FOVs) per well as separate images with the whole pyramid hierarchy leads to very suboptimal IO challenges. To visualize plates at low resolution, a tiny pyramid file needs to be loaded for each field of view. When a plate has >1000 field of views across all its wells, this becomes very, very slow. Even for a case with just 72 field of views and just 3 pyramid levels, loading was already 8 times slower with the FOVs saved as separate image pyramids vs. a single image pyramid. This seems to be quite a fundamental issue of how fast many small files vs. a single large file can be accessed and would likely get worse when using object storage vs classical file systems. See further details in the issues above
OME_Zarr_Pyramids

Thus, our solution to this has been to store our wells as a single, fused images for each well. In discussions on this issue, there was an openness to this approach being part of the spec. Thus, I have created this PR to suggest a change that would explicitly allow this and mentions the trade-offs. I hope this PR can be the place to discuss this further and see whether it can make it into the ome-ngff spec.

Open questions

How should we specify the trade-offs? I'm proposing a "Note" here, but open to other implementations. Also, is this specification of Note correct? Does it work for multi-line paragraphs?

Is the explanation of the trade-offs understandable? See here: 20261ac

Note: Trade-offs on how data is structured per well:
Field of views of the microscope MAY be saved as individual images in each
well to allow for maximal flexibility regarding translations between field of views.
Having wells with many individual images does not scale for visualisation of
large plates. Visualisation tools would then need to read all the tiny pyramid
files for each field of view to create overviews and this IO performance becomes
a big limiting factor. In that case, all the field of views SHOULD be saved as
a single, combined image. In that way, the pyramid chunks can be kept at a
reasonable size for low-resolution representations of a well.

I think it is important to get away from the field of view naming in the spec when wells can be collections of images. But there are two keys in the plate metadata that contain the name field. How should one proceed with these?
Specifically, maximumfieldcount (does it describe max field of views per well? Or in total? ⇒ is the wording of images per well correct? Or would it be images in the whole plate (though then what is “max”, isn’t that just a count)?) and field_count (is that per well or per plate? It says “fields per view” ⇒ what is a view?)

@github-actions
Copy link
Contributor

Automated Review URLs

@will-moore
Copy link
Member

Thanks for that.
I feel that MAY and SHOULD terms are about the rules of the Spec itself and probably shouldn't be used in this context? I think you can drop 1 or 2 sentences and be a bit less explicit, and users will still understand. How about this:

"Field of views of the microscope may be saved as individual images in each
well to allow for maximal flexibility regarding translations between field of views.
However, having wells with many individual images does not scale well for visualisation of
large plates. In that case, combining the fields and saving as a single image per Well is likely to
improve performance."

@will-moore
Copy link
Member

will-moore commented Aug 29, 2022

maximumfieldcount is the largest number of fields in any single Well. Please feel free to modify the description of this term to clarify this in the spec. Comes from OME model: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome.html

@jluethi
Copy link
Contributor Author

jluethi commented Aug 29, 2022

Thanks @will-moore

How about this

Sounds great, I shortened it that way

modify the description

Thanks for the confirmation. In that case, I guess it needs to remain being called maximumfieldcount & my wording change should be correct. I slightly updated the field_count to be (hoepfully) more clear as well

Copy link
Member

@will-moore will-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thx 👍

@jluethi
Copy link
Contributor Author

jluethi commented Oct 26, 2022

@will-moore Just checking in: What is the process or timeline to get this change into the OME-NGFF spec? Is there a chance it will be part of the 0.5 spec? Do I need to talk to some people or convince someone else first that this would be a good idea?
No stress at all, just wanted to check in whether I should be doing something about this PR :)

@will-moore
Copy link
Member

I would expect this to be included in v0.5 spec, especially since it's more like advice than a change in spec.
Anything else needed here @sbesson?

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from my side. We might want whoever will be driving the 0.5 roadmap to also quickly sign-off.

From a naming perspective:

  • the usage of images increases the consistency with the terminology used in the well specification
  • from the closest equivalent model, a WellSample in the OME model is defined as an image captured within a well

Regarding the discussion between alternative layouts and their suitability for different application contexts, I do not have a better suggestion than the note. Two comments:
1- this discussion applies outside the context of HCS data i.e. storing unstitched vs stitched images,
2- other decisions have similar trade-offs (chunking size, chunk dimensions, resolution granularity).
I anticipate the information about these trade-offs might be reworked as the specification evolves.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/faim-hcs-functions-to-work-with-hcs-data/78868/11

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/using-naparis-new-not-yet-released-async-functionality-to-browse-large-ome-zarr-hcs-plates/86984/1

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/best-approach-for-appending-to-ome-ngff-datasets/89070/3

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/fractal-framework-zarr-compatibility/92536/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants