Convert entity table to machine-readable format #466

tsalo · 2020-05-10T18:13:59Z

I'm not sure if it's feasible, but it would be nice if the entity table was stored as a json file, in order to make it both programmatically accessible and centralized. I know that there is an equivalent file in bids-validator and pybids, but if the filename construction rules were centralized under the actual specification, then it would be much easier to update the specification across the ecosystem without having to update a range of other packages as well.

jbteves · 2020-05-11T15:25:16Z

I would like to second this as someone largely outside the ecosystem. I wanted to make a program which could generate filenames programmatically since we have a big, complicated study, but without a file like this it's hard to build tooling to that effect IMO.

yarikoptic · 2020-05-11T15:46:52Z

I think yaml would be a better fit because could allow for comments and being friendlier to humans (cons: no de-factor validator, so should remain simple and probably for validation still use json schema validator)
It could also allow for making data structure useful beyond entity table - e.g. with a listing of corresponding terms used for a given entity in a given modality (e.g. for this effort: Repository of BIDS terms and supporting the BIDS community #423)
we could use it also in heudiconv to construct filenames following the standard order etc
might be worth keeping such specifications in a separate lightweight repository which would then be either "linked" here via git submodules, or included via git subtree mechanism. That would allow similar inclusion of it into repositories of the corresponding software projects
similar desire/related discussion happened recently in Repository of BIDS terms and supporting the BIDS community #423 (comment)

jbteves · 2020-05-11T15:56:25Z

As an outsider, .json is still preferable as more people are familiar with it and it is in line with many other choices in the specification, including the fact that metadata is stored in that same format.
I think Repository of BIDS terms and supporting the BIDS community #423 is a good example of why approaches like this are useful, thanks for linking that.
Using heudiconv in tandem with this would be wonderful
I have no opinions on how it's stored, provided it can easily be fetched with a git clone or curl operation.

tsalo · 2020-05-15T20:48:20Z

What about something like the following? I've formatted it as yaml because that was easier to write freehand into a file, but could easily switch to json for the real thing. The suffices are organized into groups like the entity table to keep it reasonably short, but I could drop the groups and make each suffix a dictionary under the datatypes.

entities:
  sub:
    description: Subject
    format: label
  ses:
    description: Session
    format: label
  task:
    description: Task
    format: label
  acq:
    description: Acquisition
    format: label
  ce:
    description: Contrast Enhancing Agent
    format: label
  rec:
    description: Reconstruction
    format: label
  dir:
    description: Phase-Encoding Direction
    format: label
  run:
    description: Run
    format: index
  mod:
    description: Corresponding Modality
    format: label
  echo:
    description: Echo
    format: index
  recording:
    description: Recording
    format: label
  proc:
    description: Processed (on device)
    format: label
  space:
    description: Space
    format: label
datatypes:
  anat:
    group1:
      suffices:
        - T1w
        - T2w
        - T1rho
        - T1map
        - T2map
        - T2star
        - FLAIR
        - FLASH
        - PD
        - PDmap
        - PDT2
        - inplaneT1
        - inplaneT2
        - angio
      extensions:
        - nii.gz
        - nii
        - json
      entities:
        sub: required
        ses: optional
        acq: optional
        ce: optional
        rec: optional
    group2:
      suffices:
        - defacemask
      extensions:
        - nii.gz
        - nii
        - json
      entities:
        sub: required
        ses: optional
        acq: optional
        ce: optional
        rec: optional
        mod: optional
  func:
    group1:
      suffices:
        - bold
        - cbv
        - phase
        - sbref
      extensions:
        - nii.gz
        - nii
        - json
      entities:
        sub: required
        ses: optional
        task: required
        acq: optional
        ce: optional
        rec: optional
        dir: optional
        run: optional
        echo: optional

yarikoptic · 2020-05-15T21:34:37Z

@tsalo -- this looks beautiful to me!

@jbteves : I do agree that consistency which could be achieved by using .json is indeed a benefit. But IMHO YAML is so much nicer and human friendly that I simply can't resist it. It also got a feature of XXI century -- support for # comments!

Here is a json view of the above yaml for comparison -- although not too bad yet but as it grows I would find it more and more easy to orient in yaml than json and all the clutter from everything in "" really makes it less readable to me

{
  "entities": {
    "task": {
      "description": "Task", 
      "format": "label"
    }, 
    "ses": {
      "description": "Session", 
      "format": "label"
    }, 
    "sub": {
      "description": "Subject", 
      "format": "label"
    }, 
    "space": {
      "description": "Space", 
      "format": "label"
    }, 
    "ce": {
      "description": "Contrast Enhancing Agent", 
      "format": "label"
    }, 
    "echo": {
      "description": "Echo", 
      "format": "index"
    }, 
    "recording": {
      "description": "Recording", 
      "format": "label"
    }, 
    "acq": {
      "description": "Acquisition", 
      "format": "label"
    }, 
    "rec": {
      "description": "Reconstruction", 
      "format": "label"
    }, 
    "run": {
      "description": "Run", 
      "format": "index"
    }, 
    "proc": {
      "description": "Processed (on device)", 
      "format": "label"
    }, 
    "dir": {
      "description": "Phase-Encoding Direction", 
      "format": "label"
    }, 
    "mod": {
      "description": "Corresponding Modality", 
      "format": "label"
    }
  }, 
  "datatypes": {
    "anat": {
      "group1": {
        "suffices": [
          "T1w", 
          "T2w", 
          "T1rho", 
          "T1map", 
          "T2map", 
          "T2star", 
          "FLAIR", 
          "FLASH", 
          "PD", 
          "PDmap", 
          "PDT2", 
          "inplaneT1", 
          "inplaneT2", 
          "angio"
        ], 
        "extensions": [
          "nii.gz", 
          "nii", 
          "json"
        ], 
        "entities": {
          "rec": "optional", 
          "acq": "optional", 
          "ses": "optional", 
          "sub": "required", 
          "ce": "optional"
        }
      }, 
      "group2": {
        "suffices": [
          "defacemask"
        ], 
        "extensions": [
          "nii.gz", 
          "nii", 
          "json"
        ], 
        "entities": {
          "acq": "optional", 
          "ses": "optional", 
          "sub": "required", 
          "rec": "optional", 
          "ce": "optional", 
          "mod": "optional"
        }
      }
    }, 
    "func": {
      "group1": {
        "suffices": [
          "bold", 
          "cbv", 
          "phase", 
          "sbref"
        ], 
        "extensions": [
          "nii.gz", 
          "nii", 
          "json"
        ], 
        "entities": {
          "task": "required", 
          "ses": "optional", 
          "sub": "required", 
          "ce": "optional", 
          "echo": "optional", 
          "acq": "optional", 
          "rec": "optional", 
          "run": "optional", 
          "dir": "optional"
        }
      }
    }
  }
}

yarikoptic · 2020-05-15T21:38:32Z

Unrelated to this issue, just wanted to share 1c of no value here: some not really widely known fact is that YAML 2.0 is a superset of json (any JSON is also a valid YAML). I.e. if at some point we decide "let's prepare for migration to YAML", conversion of .json into .yaml could be as easy as mv blah.json blah.yaml (although not as beneficial as proper re-serialization ;-)).

yarikoptic · 2020-05-16T00:52:17Z

Sorry for spamming... but I am just too excited! Such spec could then be used to produce almost if not all term tables we have. It could be used to produce target filename patterns. We could even manage to programmatically validate example filenames! It would reduce duplication and thus possible errors. Validators could avoid hardcoring and there would be no need to change validator upon addition of a term, entity, etc - it would make it possible to make validator to validate against specific version of bids, not just the latest!

sappelhoff · 2020-05-16T07:02:50Z

Thanks for throwing in some ideas to improve the entity table @tsalo -> these are some related issues: #289 #290

re: the current proposal

I agree that YAML is more readable, but am torn between that and the arguments that @jbteves brought forth in Convert entity table to machine-readable format #466 (comment)
@tsalo's YAML example (Convert entity table to machine-readable format #466 (comment)) would be very long if it were complete, wouldn't it? Perhaps Split up the entity table into smaller sections #290 can be discussed in this regard

tsalo · 2020-05-16T15:47:20Z

@yarikoptic I was thinking the same thing! The versioning aspect will be awesome!

@sappelhoff I agree that the file will end up being prohibitively long in its current form. What about splitting the files into the following:

entities: Entities, their full names, their values, and their order.
[modality]_[datatype]: Like the above, but only for a single datatype. Possibly with lists of metadata fields as well?
top_level/associated_data/any other files that should be based on bids-validator rules.

I was also a little stuck on how the json/yaml file would be rendered as a table on the site. Will whatever rendering function is used need to be in a specific language?

yarikoptic · 2020-05-16T16:38:57Z

Re length: We can partition at the top level into separate files. Unfortunately yaml as json didn't have native include mechanism, but solutions exist trying avoid doing it ourselves: https://stackoverflow.com/questions/528281/how-can-i-include-a-yaml-file-inside-another . Similar approach is taken by nwb standard, see https://github.com/NeurodataWithoutBorders/nwb-schema/blob/24fba6174ddbad171ee5bb824edfa31f86b1b16d/core/nwb.namespace.yaml which defines includes for different modalities. I am yet not sure if we want to partition by modality, I feel that we might better partition by concept/structure: entities, datatypes, terms, ... as prototyped by @tsalo.

yarikoptic · 2020-05-16T16:43:41Z

And then partition per datatypes (modality!) ;-)

yarikoptic · 2020-05-16T18:05:53Z

@tsalo we will not render this structure directly. We code helper tool to render from it all the .md tables etc to include into spec upon compilation.

edit 1: we could use something like https://pypi.org/project/tabulate/ to prepare such tables.

yarikoptic · 2020-05-17T14:33:15Z

To not derail discussion here but to outline possible mechanism for establishing historical versions of schema etc suitable for reuse by bids-aware tools, I have initiated https://github.com/bids-standard/bids-schema -- see it README.md and welcome to initiate issues (probably there is nothing really to be contributed in PRs until we get a schema going here) with questions/suggestions/notes.

tsalo · 2020-05-17T15:20:45Z

I started working on the files in tsalo/bids-specification@ref/json-entity. The datatypes are split up in the datatypes/ folder by row in the entity table. I know that the divisions in there aren't actually the same as the datatypes, but I figured it's a good start. We can figure out how to restructure them from there (including changing how they're partitioned).

@yarikoptic If we'll be using a Python script to handle the rendering then that alleviates my concerns. Thanks!

Regarding releases, I had assumed that we'd use the releases in the specification repository, but since the specification for the yaml/json files will probably change, it only makes sense to backup the schemas elsewhere and allow maintainers to adjust them as needed.

yarikoptic · 2020-05-17T16:09:39Z

Yeap, that is the purpose of that bids-schema. Also for it to be more lightweight and not carry all the bids-specification history/images etc so it could be included in tools distribution where desired... That thought triggered need to file bids-standard/bids-schema#1 ;-)

yarikoptic · 2020-05-17T16:11:40Z

Re your branch - please place all of the produced yamls into a dedicated folder (eg schema).

Edit: I think it will be useful beyond appendices, so I would have placed it on top level in the hierarchy.

tsalo · 2020-05-17T16:38:19Z

Done!

yarikoptic · 2020-05-17T21:41:57Z

Awesome! If it was a PR here I could try on entity take generation/ embedding script (unless you just do it) ;-)

tsalo · 2020-05-17T21:52:38Z

I just opened #475 as a draft PR.

tsalo added the formatting Aesthetics and formatting of the spec label May 10, 2020

tsalo mentioned this issue May 17, 2020

[INFRA] Convert entity table to yaml #475

Merged

5 tasks

tsalo changed the title ~~Convert entity table to json~~ Convert entity table to machine-readable format Jun 13, 2020

effigies mentioned this issue Jun 29, 2020

[ENH] BEP001 - Quantitative MRI #508

Closed

tsalo mentioned this issue Jul 23, 2020

Convert specification to schema format #540

Closed

sappelhoff closed this as completed in #475 Aug 11, 2020

tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Sep 23, 2020

sappelhoff added this to Conversion to schema Jun 8, 2024

sappelhoff moved this to Done in Conversion to schema Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert entity table to machine-readable format #466

Convert entity table to machine-readable format #466

tsalo commented May 10, 2020

jbteves commented May 11, 2020

yarikoptic commented May 11, 2020

jbteves commented May 11, 2020

tsalo commented May 15, 2020 •

edited by yarikoptic

Loading

yarikoptic commented May 15, 2020

yarikoptic commented May 15, 2020

yarikoptic commented May 16, 2020

sappelhoff commented May 16, 2020

tsalo commented May 16, 2020

yarikoptic commented May 16, 2020

yarikoptic commented May 16, 2020

yarikoptic commented May 16, 2020 •

edited

Loading

yarikoptic commented May 17, 2020

tsalo commented May 17, 2020 •

edited

Loading

yarikoptic commented May 17, 2020

yarikoptic commented May 17, 2020 •

edited

Loading

tsalo commented May 17, 2020

yarikoptic commented May 17, 2020

tsalo commented May 17, 2020

Convert entity table to machine-readable format #466

Convert entity table to machine-readable format #466

Comments

tsalo commented May 10, 2020

jbteves commented May 11, 2020

yarikoptic commented May 11, 2020

jbteves commented May 11, 2020

tsalo commented May 15, 2020 • edited by yarikoptic Loading

yarikoptic commented May 15, 2020

yarikoptic commented May 15, 2020

yarikoptic commented May 16, 2020

sappelhoff commented May 16, 2020

tsalo commented May 16, 2020

yarikoptic commented May 16, 2020

yarikoptic commented May 16, 2020

yarikoptic commented May 16, 2020 • edited Loading

yarikoptic commented May 17, 2020

tsalo commented May 17, 2020 • edited Loading

yarikoptic commented May 17, 2020

yarikoptic commented May 17, 2020 • edited Loading

tsalo commented May 17, 2020

yarikoptic commented May 17, 2020

tsalo commented May 17, 2020

tsalo commented May 15, 2020 •

edited by yarikoptic

Loading

yarikoptic commented May 16, 2020 •

edited

Loading

tsalo commented May 17, 2020 •

edited

Loading

yarikoptic commented May 17, 2020 •

edited

Loading