This repository should be used for the CDISC Open Source Alliance's Dataset-JSON Hackathon to contain code, minutes, notes, outcomes, discussions and more. It is meant to allow anyone working with the results and discussions to allow flexible collaboration and follow up projects.
The main purpose of this project is to investigate ways in which the format can be leveraged in tandem with CDISC Datasets as JSON.
- Dataset-JSON is a new format being designed for a more interoperable way to communicate clinical data.
- JSON-LD may be able to complete the picture.
By imagining Dataset-JSON as the compacted form of a JSON-LD graph, a single machine-readable reference included in the Dataset-JSON provides a complete description of the meaning behind the data in your transfer
"@context": "https://mdr.cdisc.org/transfer_104ab4/define_BS1234_v2#"
The referenced address would be the Define (or a transfer manifest referencing the Define) in the form of a JSON-LD, contextualising the Dataset-JSON contents into ODMv2 graph form
By changing the format of Define from XML to JSON-LD context & graph, the Define is served as a common spec that can be referenced explicitly from Dataset-JSON (as opposed to implicitly via metadataVersionOID
)
Dataset-JSON can reference Define via a explicit reference to your transfer manifest (using JSON-LD)
Have your cake and eat it! Simple streamable datasets for transport, linked explicitly to a single source of truth. A complete metadata picture that allows the data to be queried and exported as a graph
See Instructions to set up and run the demo server
See JSON-LD Overview to learn more about how this project proposes to apply JSON-LD to CDISC data and metadata
See Define via Manifest Demo to see how Dataset-JSON can reference Define spec explicitly via a single explicit reference to its transfer manifest (replace Define-XML file with a URL)
See JSON-LD Demo to go through some examples and interact with them via the JSON-LD Playground tool
See Define-LD Overview (in progress) to see how JSON-LD can be applied to Define to make it more interoperable
Experiment with Streaming JSON-LD in Python test to benchmark streaming various sizes of dataset-JSON files with and without import from JSON-LD-powered Define API
Click the respective formats to see this in action (as shown in video below)
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/VariableMeasured> <http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/description> "Demographics"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/maxValue> "600"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> <http://schema.org/name> "DM"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/DataType> "string"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/description> "Unique Subject Identifier"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://schema.org/name> "USUBJID"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/IT.USUBJID> <http://www.w3.org/2001/XMLSchema#length> "3"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/DataType> "integer"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/description> "Record identifier"^^<http://schema.org/PropertyValue> .
<http://localhost:4000/transfer_104ab4/define_BS1234_v2/ITEMGROUPDATASEQ> <http://schema.org/name> "ITEMGROUPDATASEQ"^^<http://schema.org/PropertyValue> .
_:b0 <http://schema.org/Dataset> <http://localhost:4000/transfer_104ab4/define_BS1234_v2/IG.DM> .
TLDR: Define is a detailed machine-readable dataset spec that can be used as part of a data contract. Back to top
Archived trial data is useless unless it can be understood and recreated, so the role of CDISC is to define a shared Findable, Accessible, Interoperable and Reusable way of communicating and storing research.
The data model behind CDISC has a format for communication called "Define" to formally-describe the included datasets i.e. in this submission/transfer, what was included and what it means.
Regulatory authorities globally have mandated the first released format "Define-XML" to accompany any submission data.
Many have approached Define as an annoying piece of bureaucracy that adds work at the end of the trial. Those people are causing their own problems by focusing on the requirement itself rather asking than the more important: what is it that regulators find so useful about Define?
Define is not
- a post-mortem
- esoteric, only interesting to librarians
- different from your dataset specifications
- a single-use, unobtainable .xml file in a .zip file in a secure transfer to regulators
Define is
- the blueprint for your trial outputs
- instructions for creation of datasets i.e. should come before dataset creation, can drive automation
- useful config and interface between apps and data
- a universal, non-proprietary language for dataset specification that is understandable by both machines and humans (with the right tools)
- instructions for re-use and recreation of trial data for scientists of the future
JSON-LD is a powerful means of transforming JSON into a graph by adding an explicit @context
referencing served semantic definitions.
JSON-LD brings Dataset-JSON and Define-JSON together explicitly by turning IDs and references into graph nodes and connections with universally unique IDs
This project shows that combined with Dataset-JSON and JSON-LD, Define could be
- a Data Contract a.k.a. Data Transfer Agreement, DTA
- a single source of truth accessible over API
- a transfer manifest
- a graph
Next research direction: by expressing Define as a JSON-LD context that accompanies Dataset-JSON, can the Define become the semantic context for any size/shape/type/source of Dataset-JSON that references it? E.g. JSON-LD @graph
and context that maps data into well-defined nodes, with support for linking to Biomedical Concepts metamodel.
Contribution is very welcome. When you contribute to this repository you are doing so under the below licenses. Please checkout Contribution for additional information. All contributions must adhere to the following Code of Conduct.
This project is using the MIT license (see LICENSE
) for code and scripts.
The content files like documentation and minutes are released under CC-BY-4.0. This does not include trademark permissions.
When you re-use the source, keep or copy the license information also in the source code files. When you re-use the source in proprietary software or distribute binaries (derived or underived), copy additionally the license text to a third-party-licenses file or similar.
When you want to re-use and refer to the content, please do so like the following:
Content based on Dataset-JSON Define-LD Demo (GitHub) used under the CC-BY-4.0 license.