-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MISC] Generalize rules that labels must be used consistently #1328
base: master
Are you sure you want to change the base?
Conversation
Remi-Gau
commented
Oct 17, 2022
•
edited
Loading
edited
- relates to Clarification of task entity in the appendix #1314
Something I am not sure about:
For example if we have 2 anat files
It may be the case that |
Co-authored-by: Taylor Salo <[email protected]>
Co-authored-by: Taylor Salo <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1328 +/- ##
=======================================
Coverage 87.79% 87.79%
=======================================
Files 16 16
Lines 1360 1360
=======================================
Hits 1194 1194
Misses 166 166 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would make our lives easier to require that labels are unique throughout the dataset, not only in individual modalities 🤔 it seems like a huge source of confusion for little benefit otherwise
Probably a PR to revisit early 2024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand by my point in #1328 (review) (require labels to be consistent throughout the dataset, rather than only modalities) ... BUT I think the present changes are more in the spirit of the current spec, and thus I deem this PR mergeable as a nice "refactoring".
(Note: not sure if the [ENH]
is fitting, therefore ... maybe [MISC]
?)
Updated to have consistency across datatypes too.
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. One thing I do wonder, though, is if we should make it clear that "consistency" is at the discretion of the dataset curator. The distinction between lores
and hires
might be different for different curators.
For example, one might have T1ws at 1 mm3 and 0.8 mm3, and T2ws at 1 mm3 and 0.6 mm3, and I think it's up to the curator to decide if they was acq-lores
(1 mm3), acq-hires
(0.8 mm3), and acq-veryhires
(0.6 mm3) or just acq-lores
(1 mm3) and acq-hires
(0.8 mm3 and 0.6 mm3). In both cases "lores" means low resolution and "hires" means high resolution, but the exact definitions differ.
I overall approve of the centralization and the spirit of this, but this can lead to a problem:
|
Yeah I was not sure when rereading this about the MUST and also I had my doubt about to even validate this. Maybe less constraining: turning this into an admonition to tell users they should be careful about consistency (so no SHOULD or MAY but try to bring attention to this in general). |
Rereading this after a couple months (looking through the 1.10.0 milestone), I think I agree with this. |
@@ -102,7 +102,8 @@ Consistent with other data types in BIDS, the session entity is optional. | |||
|
|||
The [`sample-<label>`](../appendices/entities.md#sample) entity is REQUIRED for | |||
Microscopy data and is used to distinguish between different samples from the same subject. | |||
The label MUST be unique per subject and is RECOMMENDED to be unique throughout the dataset. | |||
Contrary to other labels, the `sample-<label>` MUST be unique per subject | |||
and is RECOMMENDED to be unique throughout the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see why I am to be disallowed to use sample-1
, sample-2
and so on across my subjects if I do not have any other logical description for samples besides some index (label
is superset of index
AFAIK).
It seems that the intent here to demand some UUIDs, but I do not see really why here and why only for sample
s.
Moreover, as we ATM have mandatory sub-
leading folder anyways, I do not understand really what "unique per subject" means really.
Labels MUST be consistent across subjects and sessions and data types. | ||
For example, an `acq` entity label `hires` used on the `anat` data of a given subject | ||
MUST mean the same thing as `acq-hires` in `func` files of any other subject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST is IMHO too strong of a word here since we do not even define the meaning of "mean" in this case -- is it EXACTLY the same resolution? the same shimming? clearly it cannot be "the same everything" since then functional must be an anatomical ;) Other example could be _desc
for which we provision descriptions.tsv
per each subject/session hence allowing for different "meaning"s (e.g. exact filtering parameters) for the same overall descriptor filt
.
Hence
Labels MUST be consistent across subjects and sessions and data types. | |
For example, an `acq` entity label `hires` used on the `anat` data of a given subject | |
MUST mean the same thing as `acq-hires` in `func` files of any other subject. | |
Labels SHOULD be consistent across subjects and sessions and data types. | |
For example, an `acq` entity label `hires` used on the `anat` data of a given subject | |
SHOULD mean the same thing as `acq-hires` in `func` files of any other subject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be additional factor -- IIRC bids-validator would issue a WARNING whenever "same" (e.g. _task-movie_run_1
) files across subjects have different "qualities" (e.g. length), not ERROR. If changed to MUST
, it then MUST issue an ERROR rendering some previously valid BIDS datasets invalid, breaking change.
Coming late to this, but IMHO
To address similar issue at the level of columns (in particular for those to allow for such explicit annotation. In principle (TODO: find an issue we must have already), for every |