Skip to content

Releases: dcppc/crosscut-metadata

v0.6 release of crosscut metadata model

28 Sep 14:03
Compare
Choose a tag to compare
  • v0.6 release of KC7 crosscut metadata model instance for the following datasets:
    • GTEx v7 public RNA-Seq and WGS metadata from the GTEx Portal and dbGaP study phs000424.v7.p2
    • TOPMed public metadata for dbGaP studies phs001024.v3.p1, phs000951.v2.p2, and phs000179.v5.p2
    • Alliance/AGR gene location and ortholog data for mouse and rat reference genomes
  • Updated AGR ER diagram to reflect new AGR-specific DATS schema extensions

v0.5 release of crosscut metadata model

07 Sep 21:20
Compare
Choose a tag to compare
  • v0.5 release of KC7 crosscut metadata model instance for the following datasets:
    • GTEx v7 public RNA-Seq metadata from the GTEx Portal and dbGaP study phs000424.v7.p2 (single file)
    • TOPMed public metadata for dbGaP studies phs001024.v3.p1, phs000951.v2.p2, and phs000179.v5.p2
  • Combined two public GTEx JSON-LD files into a single file with subjects linked to StudyGroup
    and sample-level WGS and RNA-Seq CRAM files represented as individual Datasets.
  • Added scripts to emulate SPARQL queries directly with RDFLib API calls, to address
    SPARQL query performance issues (at least in RDFLib.)
  • Added example RDFLib script to parse GTEx JSON-LD document and generate tab-delimited summary file.
  • Identified and added workarounds for limitations of the DATS validator.
  • Updated ER diagram with v0.5 StudyGroup and file/sample-level Dataset encoding.

v0.4 release of crosscut metadata model

23 Aug 23:54
Compare
Choose a tag to compare
  • v0.4 release of KC7 crosscut metadata model instance for the following datasets:
    • GTEx v7 public RNA-Seq metadata from the GTEx Portal
    • GTEx v7 public metadata from dbGaP study phs000424.v7.p2
    • TOPMed public metadata for dbGaP studies phs001024.v3.p1, phs000951.v2.p2, and phs000179.v5.p2
  • Added context files for all DATS types that did not already have them.
  • Fixed broken JSON-LD context file references (issue #16, issue #17).
  • Added OBO Foundry context files (issue #23).
  • Added missing OBO IRIs for several terms.
  • Added bearerOfDisease for hypertension status, using new diseaseStatus property.
  • Moved GTEx subject/sample Materials from top-level Dataset to 2nd-level Datasets for consistency with TOPMed DATS JSON encoding and published ER diagrams (issue #26).
  • Started using JSON-LD id references to reduce redundancy in the GTEx instance.
  • Added a set of example Python RDFLib SPARQL queries showing how to extract study, subject, and sample metadata from the DATS instances (issue #27.)
  • Assigned a random uuid to any DATS entity without an existing JSON-LD identifier.
  • Added Dataset level dimensions for each of the variables declared in a study.
  • Converted subject and sample Material extraProperties to characteristics (issue #24, issue #29).
  • Modified conversion code to handle the 8 saliva samples in TOPMed.
  • Added proposed pattern for representing the links between individual samples and sequence (or other types of) data files.
  • Added --max_output_samples option to bin/gtex_v7_to_dats.py for testing purposes.

v0.3 release of crosscut metadata model

28 Jun 01:24
Compare
Choose a tag to compare
  • v0.3 release of KC7 crosscut metadata model instance for the following datasets:
    • GTEx v7 public RNA-Seq metadata
    • TOPMed public metadata for a single study, phs000946.v3
    • MGI C57BL/6J reference genome, annotation, and human orthologs
  • v0.3 release of KC7 crosscut metadata model for all of the above, plus:
    • TOPMed/dbGaP access-restricted metadata for a single study, phs000946.v3
  • Incorporated KC2-provided GUIDs for public GTEx v7 files
  • Added new script and preliminary DATS encoding for MGI mouse reference genome, including mouse genes and links to HomoloGene-derived human homologs.
  • Fixed broken alternateIdentifiers in TOPMed instance.

v0.2 release of crosscut metadata model

01 Jun 02:14
Compare
Choose a tag to compare
  • v0.2 release of KC7 crosscut metadata model for GTEx v7 public RNA-Seq metadata and TOPMed public metadata for a single sample study, phs000946.v3
  • RNA extract Materials are now linked to the top-level Dataset via the isAbout property.
  • New DataAnalysis objects are now linked to Datasets via the producedBy property.
  • All applicable object types now have @id and @context properties.
  • JSON-LD contexts have only been defined for a subset of the DATS object types.
  • See https://github.com/datatagsuite/schema and https://github.com/datatagsuite/context
  • GTEx subject Materials now use characteristics, not extraProperties, to store gender, age range, etc.
  • The corresponding CV definitions, which used to be in characteristics, have been removed, as they are arguably part of the crosscut metadata model, but not the metadata instance per se.
  • The extraProperties property is now used to store the original, un-harmonized metadata.
  • The public DATS JSON for TOPMed study phs000946.v3 was constructed by picking values from the publicly-available dbGaP variable summaries to produce a single "dummy" subject/sample entry.
  • For numeric values the median value was picked.
  • For enumerated types the most common value was picked, with ties broken alphanumerically.
  • The Python script that produces the TOPMed DATS JSON can also be run on the access restricted/protected metadata (not included in this distribution), in which case it will produce DATS JSON for the actual metadata, and not dummy values based on the variable summaries.

v0.1 KC7 internal-only release of crosscut metadata model

04 May 20:19
Compare
Choose a tag to compare
  • Initial KC7 internal-only release of KC7 crosscut metadata model for GTEx v7 public RNA-Seq metadata.
  • DATS JSON files validate against https://github.com/biocaddie/WG3-MetadataSpecifications master branch
  • DATS JSON files do NOT validate against current (v2.2) DATS release
  • This is a preview release for internal consumption only and it is subject to change.