Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] Convert entity table to yaml #475

Merged
merged 53 commits into from
Aug 11, 2020

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented May 17, 2020

Closes #466, closes #343, and closes #290. Can be used, once an entity-table rendering script is written, to deal with #290.

The yaml/json structure will need to be have sufficient information for tools like bids-validator, heudiconv, and pybids to be able to use them. I've somewhat based the files on the bids-validator rules files.

Changes proposed:

  • Entities and datatypes have been described with yaml files, stored in the new schema/ folder.
    • This structure reflects more a proof of concept than a final version for the schema.
  • Datatypes are broadly split into standard datatypes and "auxiliary" datatypes (i.e., datatypes that are associated with multiple modalities, and which don't contain imaging data directly).
  • @yarikoptic wrote code to generate the Entity Table from the schema files, which I have modified to some extent.
  • The Entity Table differs in some respects:
    • Run is optional for anat data. See Update entity table #343.
    • Rows are in alphabetical order, based on the Entity string.
    • Under DWI, bvec and bval have been dropped and sbref has been added, since bvec and bval are extensions, not suffixes. The addition of sbref is related to Update entity table #343.
    • Under beh, the suffix beh has been added. Related to Update entity table #343.
    • Suffixes may be listed in slightly different orders, because in the new version they're listed based on subgroups within datatypes.
    • Single-suffix datatypes include the suffix in parentheses.
    • Auxiliary datatypes have the suffix followed by associated datatypes in parentheses (like the current table), but the associated datatypes are space-separated instead of slash-separated.
    • Headshape and markers are labeled like standard suffixes, rather than auxiliary datatypes.

edit @sappelhoff 2020-06-07: add to do list:

to do

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't decided on inclusion mechanism yet, did you?
Left one more comment.
I think concentrating on entity table first would be best (just my reaction when I saw also descriptions of top level files/directories which I think could just go into a single yaml).

src/schema/top_level_files.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments throughout

src/schema/datatypes/anat.yml Outdated Show resolved Hide resolved
src/schema/associated_data.yml Outdated Show resolved Hide resolved
src/schema/datatypes/anat.yml Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Collaborator

re "inclusion" -- I guess terminology is not clear. The recent commit is adding required field -- so it is about stating the requirement to include any given file in BIDS dataset. I meant "inclusion" of one .yaml file into another. I.e. we need to establish a hierarchy with a single YAML file as an entry point, so loading that one would load all of them. I would just use https://github.com/tanbro/pyyaml-include which is inline with other ad-hoc solutions proposed on https://stackoverflow.com/questions/528281/how-can-i-include-a-yaml-file-inside-another .

I could take care about it whenever I get to work on rendering the entity table.

src/schema/datatypes/anat.yaml Outdated Show resolved Hide resolved
@tsalo
Copy link
Member Author

tsalo commented May 22, 2020

re "inclusion" -- I guess terminology is not clear. The recent commit is adding required field -- so it is about stating the requirement to include any given file in BIDS dataset. I meant "inclusion" of one .yaml file into another. I.e. we need to establish a hierarchy with a single YAML file as an entry point, so loading that one would load all of them. I would just use https://github.com/tanbro/pyyaml-include which is inline with other ad-hoc solutions proposed on https://stackoverflow.com/questions/528281/how-can-i-include-a-yaml-file-inside-another .

I could take care about it whenever I get to work on rendering the entity table.

That would be great, thank you. I'm just not that familiar with yaml.

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "git-sedi suffices suffixes",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "src/schema"
}
^^^ Do not change lines above ^^^
- json
- ctf/
- fif
- 4d/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<rant> wow, I didn't know that (sub)folders are allowed here. I truly think this was a step backward for meeg proposal to just allow all those formats -- there is no 'standardization' now, tools will just support some but not the other formats etc. </rant>

- eeg
- ieeg
suffixes:
- photo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @tsalo -- Thank you for leading this effort! It is really fun to look at BIDS in such as structured way!

So within this datatypes/ we indeed have datatypes which gained their folder/ within BIDS hierarchy (they also have themselves duplicated now in datatypes) such as anat/, func/, fmap/, etc. (+ beh/ which isn't imaging), and then we have another group -- those which could have indeed been a modality of their own, but they are not "imaging data". They became auxiliary files which could go along with base imaging (and beh/) datatypes. I think that

  • all the photo, channels etc should get their own "folder" in this schema, e.g. auxiliary or auxdatatypes, and be moved there (instead of being under datatypes)
  • datatypes/ entries can get rid of the datatypes field in their .yaml files -- they should then have 1-to-1 correspondence from the filename to corresponding datatype.

On the top level (for inclusion) I see smth like

datatypes:
  anat: !include datatypes/anat.yaml
  func: !include datatypes/func.yaml
  ...
auxdatatypes:
  channels: !include auxdatatypes/channels.yaml
  event: !include auxdatatypes/events.yaml

This would make it easier to tell one (which gets its own folder in BIDS) from another (just provides additional suffixed files).

Alternative could be - keep as is, and then add additional field (in a yet to be created higher level "inclusion" file) to signal which datatype is auxiliary, but then it would be a bit less structured IMHO and most likely we would need to maintain that redundant self mentioning within datatypes for the base data types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split auxdatatypes from datatypes and dropped the datatypes key from the datatype yamls. I have to say, it looks much cleaner.

@vsoch
Copy link

vsoch commented Jun 2, 2020

@tsalo is there something I can help with here?

@tsalo
Copy link
Member Author

tsalo commented Jun 3, 2020

@vsoch Absolutely! At this point, I think any input would be helpful, to ensure that the yaml/json files properly reflect the specification. Once the yaml/json files are more finalized, though, we will need scripts to build the tables for the specification, and bids-validator and pybids will also need interfaces. I know that @yarikoptic can help with that, but I won't be much use at that point, so your help would be amazing.

@yarikoptic I didn't realize I hadn't touched this in two weeks. Sorry about that! I will try to start responding more over the next couple of days.

@vsoch
Copy link

vsoch commented Jun 3, 2020

@tsalo I helped with the original development of BIDS long ago and far away, but haven't really looked at or used it in years, so I probably am not one to give feedback on that. Once that is done and you need to build tables, however, that might be something I can help with, as long as it's possible to scope out what exactly is needed (e.g., an example output table to produce).

@effigies
Copy link
Collaborator

effigies commented Aug 7, 2020

First thing on Monday, if I don't end up working this weekend. :-)

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, a couple small issues. And obviously need to merge in or rebase onto master and rerun.

CONTRIBUTING.md Outdated Show resolved Hide resolved
src/schema/datatypes/func.yaml Outdated Show resolved Hide resolved
src/schema/datatypes/beh.yaml Show resolved Hide resolved
@tsalo
Copy link
Member Author

tsalo commented Aug 10, 2020

Thanks @effigies! I've merged your changes, updated from master, and added in the remaining associated changes from #556.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this huge effort!

@effigies effigies dismissed stale reviews from nicholst and yarikoptic August 10, 2020 19:13

All suggestions addressed

acq:
name: Acquisition
description: |
The OPTIONAL acq-<label> key/value pair corresponds to a custom label the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if `TXT` is allowed in yaml, we better make those acq-<label> into `acq-<label>` right away (as well as sample filenames which could have some characters which markdown could interpret for formatting) to simplify any possible future rendering... Since we do not render them currently though -- I think it would be ok to proceed as is (unless some other really needed changes are suggested) and just defray it to a separate PR

src/schema/entities.yaml Show resolved Hide resolved
ce:
name: Contrast Enhancing Agent
description: |
Similarly the OPTIONAL ce-<label> key/value can be used to distinguish
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional or not is a property per specific modality etc.

"Similarly ..." is the paragraph opening .

ce-<label> is duplicate with the record itself.

"can be used to distinguish" is also somewhat "to make it a sentence".

Here and in others I would have just kept it to the point, and start from the 2nd sentence: "The label is the name of the contrast agent.".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if we push updating entity definitions until a PR for #567? I'm planning to make a version of #568 that is compatible with this PR's changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mind at all! Just wanted to leave a note ;-)

mod:
name: Corresponding Modality
description: |
In such cases the OPTIONAL `mod-<label>` key/value pair corresponds to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what "Such cases"?

@yarikoptic
Copy link
Collaborator

Thank you @tsalo ! It looks great!
While approving I have clicked too fast though -- I want to note that ideally descriptions in entities.yaml should be adjusted. But that could be done even after this PR since they aren't used ATM anywhere yet.

@tsalo
Copy link
Member Author

tsalo commented Aug 10, 2020

Now that there are two approvals, should we wait five days before merging (pending any requested changes, of course)?

@effigies
Copy link
Collaborator

We don't really count cleanup changes as restarting the clock, usually. I would say this can be merged at @sappelhoff's convenience.

@yarikoptic
Copy link
Collaborator

One quick click for @sappelhoff , one giant leap for BIDS!

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this thorough work that will pay off in the future!

Also thanks to the reviewers and contributors :-)

I found that LICENSE is missing next to README and CHANGES in the top_level_files.yaml, so I am just going to add it and then merge. I agree that everything else can be done in follow up PRs.

name: Split
description: |
In the case of long data recordings that exceed a file size of 2Gb, the
.fif files are conventionally split into multiple parts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the future, the split entity could be used by any file format IMHO

src/schema/top_level_files.yaml Show resolved Hide resolved
@sappelhoff sappelhoff merged commit a3f92ce into bids-standard:master Aug 11, 2020
@tsalo tsalo deleted the ref/json-entity branch August 11, 2020 14:02
@tsalo tsalo added schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 11, 2022
@tsalo tsalo removed the help wanted Extra attention is needed label Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatting Aesthetics and formatting of the spec schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release.
Projects
9 participants