Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Generalize rules that labels must be used consistently #1328

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Remi-Gau
Copy link
Collaborator

@Remi-Gau Remi-Gau commented Oct 17, 2022

@Remi-Gau
Copy link
Collaborator Author

Something I am not sure about:

  • no idea how to validate for this
  • this rule may be too restrictive

For example if we have 2 anat files

  • sub-01_acq-highRes_T1w.nii
  • sub-01_acq-highRes_T2w.nii

It may be the case that highRes refers to different things for those different types of suffixes.

@codecov
Copy link

codecov bot commented Oct 18, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.79%. Comparing base (1cb92eb) to head (fe5397b).

❗ Current head fe5397b differs from pull request most recent head b165609. Consider uploading reports for the commit b165609 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1328   +/-   ##
=======================================
  Coverage   87.79%   87.79%           
=======================================
  Files          16       16           
  Lines        1360     1360           
=======================================
  Hits         1194     1194           
  Misses        166      166           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would make our lives easier to require that labels are unique throughout the dataset, not only in individual modalities 🤔 it seems like a huge source of confusion for little benefit otherwise

@Remi-Gau Remi-Gau added this to the 1.10.0 milestone Dec 22, 2023
@Remi-Gau Remi-Gau added the consistency Spec is (potentially) inconsistent label Dec 22, 2023
@Remi-Gau
Copy link
Collaborator Author

Probably a PR to revisit early 2024

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand by my point in #1328 (review) (require labels to be consistent throughout the dataset, rather than only modalities) ... BUT I think the present changes are more in the spirit of the current spec, and thus I deem this PR mergeable as a nice "refactoring".

(Note: not sure if the [ENH] is fitting, therefore ... maybe [MISC]?)

@Remi-Gau Remi-Gau changed the title [ENH] Generalize rules that labels must be used consistently [MISC] Generalize rules that labels must be used consistently Apr 19, 2024
@Remi-Gau
Copy link
Collaborator Author

I stand by my point in #1328 (review) (require labels to be consistent throughout the dataset, rather than only modalities) ... BUT I think the present changes are more in the spirit of the current spec, and thus I deem this PR mergeable as a nice "refactoring".

Updated to have consistency across datatypes too.

(Note: not sure if the [ENH] is fitting, therefore ... maybe [MISC]?)

Done

@sappelhoff sappelhoff requested a review from effigies April 19, 2024 07:49
Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. One thing I do wonder, though, is if we should make it clear that "consistency" is at the discretion of the dataset curator. The distinction between lores and hires might be different for different curators.

For example, one might have T1ws at 1 mm3 and 0.8 mm3, and T2ws at 1 mm3 and 0.6 mm3, and I think it's up to the curator to decide if they was acq-lores (1 mm3), acq-hires (0.8 mm3), and acq-veryhires (0.6 mm3) or just acq-lores (1 mm3) and acq-hires (0.8 mm3 and 0.6 mm3). In both cases "lores" means low resolution and "hires" means high resolution, but the exact definitions differ.

@effigies
Copy link
Collaborator

I overall approve of the centralization and the spirit of this, but this can lead to a problem:

  1. Validation of this would be computationally painful, even if it were well defined. However, without metadata tags tied to each entity, how would we validate this?
  2. If a tool reads this "MUST", assumes it means something is validated and can be depended on, and decides to optimize by not rereading metadata it thinks it knows, you could end up with invalid processing.

@Remi-Gau
Copy link
Collaborator Author

Yeah I was not sure when rereading this about the MUST and also I had my doubt about to even validate this.

Maybe less constraining: turning this into an admonition to tell users they should be careful about consistency (so no SHOULD or MAY but try to bring attention to this in general).

@effigies
Copy link
Collaborator

Maybe less constraining: turning this into an admonition to tell users they should be careful about consistency (so no SHOULD or MAY but try to bring attention to this in general).

Rereading this after a couple months (looking through the 1.10.0 milestone), I think I agree with this.

@effigies effigies removed the request for review from erdalkaraca August 16, 2024 13:57
@effigies effigies modified the milestones: 1.10.0, 1.11.0 Aug 27, 2024
@@ -102,7 +102,8 @@ Consistent with other data types in BIDS, the session entity is optional.

The [`sample-<label>`](../appendices/entities.md#sample) entity is REQUIRED for
Microscopy data and is used to distinguish between different samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique throughout the dataset.
Contrary to other labels, the `sample-<label>` MUST be unique per subject
and is RECOMMENDED to be unique throughout the dataset.
Copy link
Collaborator

@yarikoptic yarikoptic Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why I am to be disallowed to use sample-1, sample-2 and so on across my subjects if I do not have any other logical description for samples besides some index (label is superset of index AFAIK).
It seems that the intent here to demand some UUIDs, but I do not see really why here and why only for samples.

Moreover, as we ATM have mandatory sub- leading folder anyways, I do not understand really what "unique per subject" means really.

Comment on lines +90 to +92
Labels MUST be consistent across subjects and sessions and data types.
For example, an `acq` entity label `hires` used on the `anat` data of a given subject
MUST mean the same thing as `acq-hires` in `func` files of any other subject.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST is IMHO too strong of a word here since we do not even define the meaning of "mean" in this case -- is it EXACTLY the same resolution? the same shimming? clearly it cannot be "the same everything" since then functional must be an anatomical ;) Other example could be _desc for which we provision descriptions.tsv per each subject/session hence allowing for different "meaning"s (e.g. exact filtering parameters) for the same overall descriptor filt.

Hence

Suggested change
Labels MUST be consistent across subjects and sessions and data types.
For example, an `acq` entity label `hires` used on the `anat` data of a given subject
MUST mean the same thing as `acq-hires` in `func` files of any other subject.
Labels SHOULD be consistent across subjects and sessions and data types.
For example, an `acq` entity label `hires` used on the `anat` data of a given subject
SHOULD mean the same thing as `acq-hires` in `func` files of any other subject.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be additional factor -- IIRC bids-validator would issue a WARNING whenever "same" (e.g. _task-movie_run_1) files across subjects have different "qualities" (e.g. length), not ERROR. If changed to MUST, it then MUST issue an ERROR rendering some previously valid BIDS datasets invalid, breaking change.

@yarikoptic
Copy link
Collaborator

Coming late to this, but IMHO

  • we indeed should indeed strive to generalize e.g. all labels by default SHOULD to be consistent within dataset.
    • but then, making some entities "special" adds cognitive burden/complexity/inflexibility etc. (referring to sample here)
  • BUT also to allow for flexibility to accommodate different "contexts" to allow explicitly define how any particular label/index should be treated

To address similar issue at the level of columns (in particular for those {entity}_id, like participant_id, session_id), I proposed

to allow for such explicit annotation.

In principle (TODO: find an issue we must have already), for every {entity_single} we could have {entity_plural}.[tsv,json] which would have that {entity_single}_id column (and at least "Description" column) with {entity_plural}.json providing such ContextURI to state that it is either this dataset unique (bids::), or may be even bids:raw: pointing that it is the same as of the original BIDS dataset, and then come up with URI identifiers for other contexts, e.g. to treat values within {} as entity names, hence bids::sub-{subject} could be used to state that it is per-subject entity and bids::sub-{subject}/ses-{session} and alike. And then with Description to actually describe the "meaning" behind the label (relating to my inline comment), even if just for a human -- is it "highres" as the same resolution, or "highres" flavor of T1w (0.2mm) or BOLD (1mm), and so on. task_id "rest" meaning "subjects laying down with eyes closed, allowed to mind wonder, etc", "movie" meaning "watching some movie" etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Spec is (potentially) inconsistent
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants