Skip to content

TeMeta/Dataset-JSON_hackathon

Repository files navigation

CDISC Over Linked Data (JSON-LD)

This repository should be used for the CDISC Open Source Alliance's Dataset-JSON Hackathon to contain code, minutes, notes, outcomes, discussions and more. It is meant to allow anyone working with the results and discussions to allow flexible collaboration and follow up projects.

CDISC Open Source Alliance logo Dataset-JSON Hackathon logo JSON-LD logo

The main purpose of this project is to investigate ways in which the JSON-LD-logo-88 format can be leveraged in tandem with CDISC Datasets as JSON.

  • Dataset-JSON is a new format being designed for a more interoperable way to communicate clinical data.
  • JSON-LD may be able to complete the picture.

By imagining Dataset-JSON as the compacted form of a JSON-LD graph, a single machine-readable reference included in the Dataset-JSON provides a complete description of the meaning behind the data in your transfer

"@context": "https://mdr.cdisc.org/transfer_104ab4/define_BS1234_v2#"

The referenced address would be the Define (or a transfer manifest referencing the Define) in the form of a JSON-LD, contextualising the Dataset-JSON contents into ODMv2 graph form

By changing the format of Define from XML to JSON-LD context & graph, the Define is served as a common spec that can be referenced explicitly from Dataset-JSON (as opposed to implicitly via metadataVersionOID)

Dataset-JSON can reference Define via a explicit reference to your transfer manifest (using JSON-LD)

Video showing Dataset-JSON referencing Define explicitly via a manifest

Have your cake and eat it! Simple streamable datasets for transport, linked explicitly to a single source of truth. A complete metadata picture that allows the data to be queried and exported as a graph

See Instructions to set up and run the demo server

See JSON-LD Overview to learn more about how this project proposes to apply JSON-LD to CDISC data and metadata

See Define via Manifest Demo to see how Dataset-JSON can reference Define spec explicitly via a single explicit reference to its transfer manifest (replace Define-XML file with a URL)

See JSON-LD Demo to go through some examples and interact with them via the JSON-LD Playground tool

See Define-LD Overview (in progress) to see how JSON-LD can be applied to Define to make it more interoperable

Experiment with Streaming JSON-LD in Python test to benchmark streaming various sizes of dataset-JSON files with and without import from JSON-LD-powered Define API

Transform to compacted, expanded, and RDF formats by including a JSON-LD @context

Click the respective formats to see this in action (as shown in video below) Video showing Define and Dataset transformations to linked data

Dataset-JSON transformed to RDF via JSON-LD

<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/VariableMeasured> <http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/description> "Demographics"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/maxValue> "600"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/name> "DM"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/DataType> "string"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/description> "Unique Subject Identifier"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/name> "USUBJID"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://www.w3.org/2001/XMLSchema#length> "3"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/DataType> "integer"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/description> "Record identifier"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/name> "ITEMGROUPDATASEQ"^^<http://schema.org/PropertyValue> .
_:b0 <http://schema.org/Dataset> <http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> .

TLDR: Define is a detailed machine-readable dataset spec that can be used as part of a data contract. Back to top

Archived trial data is useless unless it can be understood and recreated, so the role of CDISC is to define a shared Findable, Accessible, Interoperable and Reusable way of communicating and storing research.

The data model behind CDISC has a format for communication called "Define" to formally-describe the included datasets i.e. in this submission/transfer, what was included and what it means.

Regulatory authorities globally have mandated the first released format "Define-XML" to accompany any submission data.

Many have approached Define as an annoying piece of bureaucracy that adds work at the end of the trial. Those people are causing their own problems by focusing on the requirement itself rather asking than the more important: what is it that regulators find so useful about Define?

Define is not

  • a post-mortem
  • esoteric, only interesting to librarians
  • different from your dataset specifications
  • a single-use, unobtainable .xml file in a .zip file in a secure transfer to regulators

Define is

  • the blueprint for your trial outputs
  • instructions for creation of datasets i.e. should come before dataset creation, can drive automation
  • useful config and interface between apps and data
  • a universal, non-proprietary language for dataset specification that is understandable by both machines and humans (with the right tools)
  • instructions for re-use and recreation of trial data for scientists of the future

Conclusions and next steps

JSON-LD is a powerful means of transforming JSON into a graph by adding an explicit @context referencing served semantic definitions.

JSON-LD brings Dataset-JSON and Define-JSON together explicitly by turning IDs and references into graph nodes and connections with universally unique IDs

This project shows that combined with Dataset-JSON and JSON-LD, Define could be

  • a Data Contract a.k.a. Data Transfer Agreement, DTA
  • a single source of truth accessible over API
  • a transfer manifest
  • a graph

Next research direction: by expressing Define as a JSON-LD context that accompanies Dataset-JSON, can the Define become the semantic context for any size/shape/type/source of Dataset-JSON that references it? E.g. JSON-LD @graph and context that maps data into well-defined nodes, with support for linking to Biomedical Concepts metamodel.

Contribution

Contribution is very welcome. When you contribute to this repository you are doing so under the below licenses. Please checkout Contribution for additional information. All contributions must adhere to the following Code of Conduct.

License

License: MIT License: CC BY 4.0

Code & Scripts

This project is using the MIT license (see LICENSE) for code and scripts.

Content

The content files like documentation and minutes are released under CC-BY-4.0. This does not include trademark permissions.

Re-use

When you re-use the source, keep or copy the license information also in the source code files. When you re-use the source in proprietary software or distribute binaries (derived or underived), copy additionally the license text to a third-party-licenses file or similar.

When you want to re-use and refer to the content, please do so like the following:

Content based on Dataset-JSON Define-LD Demo (GitHub) used under the CC-BY-4.0 license.

About

Trying out some concepts for the Dataset-JSON hackathon

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages