-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC-3: more dimensions for thee #239
base: main
Are you sure you want to change the base?
Conversation
Automated Review URLs |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/5 |
PS: I asked @joshmoore whether whimsy was allowed and he said yes, hence the title. (This comes after I realised I couldn't have "RFC-2: dimensional hullabaloo" because @normanrz had taken that number already. 😂) |
full endorsement. While i absolutely recognize the significant challenge that lifting the strict dimensionality model may pose for mapping arbitrary future usage onto legacy code bases that have been built around XYZCT, I fully agree that a true next-generation format is going to have to lift it. I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption. I agree with @jni that concerns around communicating the semantics of specific axes (i.e. formally named "X", "Y" and "Z") are better addressed by additional keys in the axis metadata, such as |
For comparison, https://datatracker.ietf.org/doc/html/rfc2549 ("IP over Avian Carriers") |
Would you be able/willing to contribute those, perhaps even for a section in the RFC? |
Sure, the most direct stories I can share are from implementing writers for data coming off microscopes (code in pymmcore-plus/mda/handlers). There I essentially have a |
it's possible that @nclack and/or @aliddell would have opinions here as well, as I know they've spent a fair amount of time thinking about how to map a variety of custom experiment types to the ngff format in the acquire-python schema |
@tlambert03 thanks for the links! I'll add these to the background section, but could you point me to where in the code
would fail? The smoking gun would be:
Maybe it's not as easy as that to define these things compactly, but if it is, I think it would be worthwhile detail for this RFC's motivation. |
A few quick clarifications, @jni:
|
Re: NGFF readers: cc @manzt - https://github.com/hms-dbmi/vizarr - Any idea how much work it would be to support n-dimensional NGFF data? cc @dgault - https://github.com/ome/ZarrReader/ - Since the OME data model is very much 5D, this is going to take a bit of thought on how to handle n-dimensional NGFF data? |
The space restrictions, and all other axis restrictions (other than the requirement that axes have unique names) are removed in #235 |
Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions. |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/request-for-info-ngff-use-cases-that-are-currently-blocked/96641/1 |
As part of the [proposed implementation][implementation], Davis Bennett has | ||
created pydantic models that validate the proposed schema. These are actually | ||
new additions to the NGFF specification, surfaced pre-existing errors in the | ||
schema, and should prevent new errors from appearing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this text. Those pydantic models are merely a convenient way to write JSON schema. They don't express anything that's not already written in the prose of the spec. Also, I am planning on removing those models from the PR, because they add an undocumented build step that I don't have the energy to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as of aa5c953 those models are gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b I really loved the models! 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will live here https://github.com/janeliascicomp/pydantic-ome-ngff if 0.5 comes out
The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions. I imagine that supporting arbitrary dimension order is a fair bit more work for implementers that n-dimensions, so that endorsement of this proposal may not signal endorsement of #235? |
regarding advice for partial implementations (e.g., implementations that only support a fixed number of dimensions, or a fixed order), I included the following section in the PR: https://github.com/ome/ngff/pull/235/files#diff-ffe6148e5d9f47acc4337bb319ed4503a810214933e51f5f3e46a227b10e3fcdR565-R580, please let me know if this guidance is sufficient or if we should say more (and lets have that conversation over in #235 instead of here, so that we can keep synchronized with the actual changes to the spec) |
I probably need to update the summary at the top, but under "proposal" I write:
If the names are arbitrary, the ordering must also be arbitrary, surely? But I can make it explicit. |
A draft proposal for [coordinate transformations][trafo spec] already includes | ||
most of the changes proposed here, so we envision that this RFC is compatible | ||
with future plans for the format. The proposal does currently limit the number | ||
of dimensions of type "space" to at most 3, but that limit [could be | ||
removed][space dims comment]. If this RFC is approved, the transformation | ||
specification would need to be updated to reflect this. However, that is an easy | ||
change and there seems to be sufficient support in the community for this idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to talk about that (stalled) PR at all? I don't see why it's relevant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's relevant because it speaks to the forward compatibility of this RFC — ie it is in line with existing proposals for the format. That the PR is stalled is not really relevant — it's stalled because of minor details (e.g. array order) that don't have a bearing on this PR. Based on the discussion, other aspects, and certainly the ones relevant to this RFC, have quite broad consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case isn't it sufficient to just state that there are no known conflicts with other active proposals?
in my opinion the spec should leave this question undefined. The mapping can be direct (x=x, y=y, z=z), user defined (give users options of how to map axes), or arbitrary (x=foo, y=bar, z=baz). In practice, I think it is not going to be an issue, I am just wary of restricting the number for the same reason that I was wary of restricting the total number of dimensions, which indeed caused problems. If it helps this RFC move forward, I can bring back the "maximum three spatial dimensions" limit from #138, and we can have the discussion in a later RFC. The unlike the other changes in this RFC, the removal of the maximum number of "space" dimensions is purely speculative on my end, and not motivated from a concrete use case. Action requested:
|
Implementations that do not support some aspect of user data should clearly communicate that to users. Users can then decide which implementation use, given the data they have stored. We should not try to limit the data that users can store, simply because some implementations cannot represent that data. This is a broader issue: as of 0.4, there are lots of OME-NGFF tools that don't support big images on cloud storage (of which I have plenty). Should we change the spec to limit the size, or location of images, just because some implementations can't load my big ones? I don't think so. So for the same reason, we should not restrict what axes users have, just because some implementations are opinionated about axes. |
I'm in favour of removing as many restrictions as possible. Fewer restrictions generally leads to simpler standards. Why limit how many dimensions can be labelled "spatial"? It's not necessary, and so it shouldn't be done. Software may want to limit how many dimensions they can handle, makes code simpler. But we have plenty of toolboxes/libraries that can handle arrays with an arbitrary number of dimensions. So, the standard shouldn't limit the number. Instead, let the software that wants to have a limit on the number of dimensions simply refuse to read files with too many dimensions. Don't standardize on the minimum common denominator. |
I have a similar question as @will-moore as what is the exact change to the specification with this RFC. Is it exactly the same as #235? If so, I think the RFC should be more explicit about its implications. |
@ziw-liu I'm not the author of this RFC, but as the author of #235 I can say that in that PR there are no restrictions on the So, if applications previously relied on the
The above example has two axes that spatial, but they use a different Does this summarize your concern? Because if so, I agree with this concern but I think the actual problem is the I would welcome discussion in #215 on what the axis I am happy to amend #235 to add recommendations for the |
That is the intent.
I disagree with @d-v-b and have commented on #215, but to record the objection here and keep a semi-complete record of discussion on the RFC PR itself: a unit does not unambiguously specify a type, for example "wavelength" and "space" are both measured in meters, to say nothing of "stage position", for example. Therefore, I suggest that we use SHOULD as guidance for the special cases of "space", "time", and "channel". But I don't want to use MUST here because, as mentioned in the discussion above, I think it's ok for software to not support all ome-ngff files. For example, I think ome-ngff should be usable to store Fourier-transformed data (type "spatial-frequency", and data type complex64/complex128), but many viewers won't be able to work with that immediately or ever, and I think that's ok. |
Thanks @d-v-b and @jni. For me something like this would be useful:
I would also appreciate a similar recommendation for visualization implementations about choosing 'special dimensions' (e.g. first or last 1 |
but they shift the burden to implementations which then reduces interoperability because loose specs have ambiguities. I am not against removing restrictions but then there should be strict guidance on what implementations must do so probably not use SHOULD but MUST. |
Personally I would benefit from some concrete examples of implementations that rely on the current axis restrictions, so that we can better appreciate what impact these proposed changes would have, and potentially how to mitigate those impacts. |
@d-v-b My own implementation relies on the order t, c, z, y, x. This has the advantage of having clear semantics and this is not just useful for visualization but also for computing. What I worry about is that by making the specs too flexible, this ends up with different implementations eventually leading to variant formats just like happened with TIFF. I'd prefer specs that cover 90% of the use cases in a clear and unambiguous way. But maybe that's an issue of scope, i.e. what is NGFF supposed to cover? I haven't encountered microscopy images that don't fit the current dimension pattern so maybe having concrete examples (e.g. from papers) of microscopy images with more than 5 axes would be useful to understand the need and help reason about it. |
Our implementation of Webknossos expects that there are 2-3 space axes. We don't rely on ordering and all other axes can be arbitrary. |
@d-v-b Here's a list of implementations I'm aware of that rely on current axis restrictions. In general these tools handle 2D data (or a stack of 2D planes) so they expect the last 2 dimensions to be
|
i disagree that allowing flexible axes will cause fragmentation. I think it's the opposite actually. @jkh1, if your use case relies on a strict TCZYX model, then your primary concern should be whether, given a dataset, you can unambiguously find those 5 axes in the dataset (and then, you can transpose them as needed to fit your required dimension ordering). not whether the specification technically allows for someone else to do something you're not interested in doing. (It's the restriction of those other use cases that causes fragmentation) I absolutely agree (and I think we all do?) that inasmuch as a dataset does have a standard 3-space + 1-time + 1-channel dimensional model, then the spec should make it unambiguous how to find that. I think it already does that. |
I agree that use cases not covered will use something else hence also lead to fragmentation but I think preserving interoperability for the greater number of cases should have priority. I am not against changing the model, my primary concern is about preserving the unambiguous semantics of the axes when this is done and maybe the issue is that it isn't clear (at least to me) that this will be the case. The new specs should include the current one as valid subset and also be semantically unambiguous. This probably means standardizing the vocabulary and defining what implementations should do with what they don't understand. |
Here is a quick rundown of imaging modalities that I think would have trouble fitting into the
The new spec does include the current one as a valid subset, and I would argue that the current spec is actually semantically very ambiguous, because it doesn't define what the different types of axes mean. I think contributions to improve this would be welcome. |
another important example, on the lower end of the dimensionality spectrum: the output of a line-scanning camera is a 1 dimensional array. As OME-NGFF 0.4 cannot represent 1D data, the format cannot represent a single frame of a line-scanning camera image. NGFF users with such data would have to pad their data with fake dimensions, which is data mangling and a very bad user experience. |
Another example I've been working with lately: electron backscatter diffraction (EBSD), which stores a 2D diffraction pattern for each pixel of a material. And the summarised data is still xy+(3D angle) or xy+(quaternion). (The latter could stored as "channel" but that's a little bit of an abuse of the spec, imho.) My reading of the discussion above is "loosen restrictions, but offer guidance with SHOULD as to how tczyx SHOULD be stored". imho, though, we should drop the order requirement — it is not hard to transpose an array, and there are good reasons (e.g. performance) why e.g. during acquisition, one might want to store bytes in TZCYX order. Of course, that could be done in some "transitional" format, but I think it would be super nice for everyone if that was also a valid OME-NGFF file! |
Strongly agree that order shouldn't matter. It's trivial to create transposed views of when reading |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/python-bioformats-not-able-to-correctly-open-an-image/96600/14 |
Hi all, very important discussion. For us (from a microscope vendor perspective) the old TCZYX 5D approach is one of the biggest limitation already and obviously we hope that OME-NGFF will allow for arbitrary dimensions, that can be identifies and read easily. If this will not be the case, there will be than again various workarounds etc. will will in my opinion defeat the idea behind OME-NGFF. For us this would again mean, the OME-NGFF is just another format, where we need to figure out how to convert our CZI format to. But if it supports many dimensions etc., this becomes a valid option to be used even for vendors. |
Thanks for the input @sebi06! Would you be happy to be listed as an endorser of the RFC? (If so, please 👍 the original post at the top.) I use libczi as an example in the RFC, but if it were endorsed by you directly that example would probably hold more weight. 😊 |
I don't understand this point. I think I've addressed all other points in my recent commits. To summarise:
|
Overall, to recap my last message, I think this bit:
is the most important part of the discussion that I think lets the proposal move forward, by providing the flexibility that is clearly needed by many relevant parties, while assuaging concerns about unnecessary fragmentation. (ie I think most datasets that folks will want to deal with will still have tzyx in some order.) I have tried to capture this idea in the most recent revision. Sorry about the radio silence in the meantime. 😅 But I hope we can move this forward now and get on with implementations! |
i.e. that we introduce nomenclature for what is and what is not supported. Sorry, that's outside your PR specifically. In this case perhaps it suffices to use "RFC-3 supported" but that may eventually become unwieldy in which case we could have "profiles" or sets of features which can be clearly advertised by libraries and clients. |
What's the next step here @joshmoore? Do we merge and then enter review? How are reviewers decided and how much review is "enough" review? |
|
To make sure we have consensus, I'm opening this RFC in the style of RFC-2. (I'm aware that RFC-1 has some pending issues to be resolved, but when consensus is possible 🤞 this is a useful way to document the history of past decisions.) Please add a thumbs up if you want to be listed as an endorser. Please reply if you have concerns.
@d-v-b @joshmoore @normanrz @bogovicj @will-moore @ziw-liu @tlambert03
Please add pings for authors of libraries implementing ome-ngff readers and writers, as the main effect here is not on existing files but on implementations that may implement too-restrictive a spec.
My goal is to get this and #235 merged before the upcoming 0.5 release. 🤞 (I think that is being targeted for late June/early July? @joshmoore?)
Review URL: https://ngff--239.org.readthedocs.build/rfc/3/