Skip to content

DCP Technical Architecture

Madison Dunitz edited this page Oct 21, 2019 · 2 revisions

Component Specific Documentation

Description Repo(s) API documentation Charter Contact Internal Architecture Data Model State Inputs Transformation Outputs Dependencies Other important documentation
Metadata The Human Cell Atlas (HCA) Metadata Schema are JSON format schema. These schemas are designed to capture and provide structure for the descriptive scientific metadata associated with HCA datasets. These schemas aim to ensure the FAIRness of the HCA data. Metadata Schema, Metadata Schema Publisher N/A Metadata Charter Laura Clarke, Norman Morrison, Mark Diekhans Link The metadata component produces JSON schema which define several core entities (project, biomaterial, protocol, process and file), type schema which declare specific subtypes of the different entities (biomaterial can be donor_organism, specimen_from_organism, cell suspension, cell_line, organoid and imaged specimen) and schema modules which provide schema for attributes which meet a specific use case (e.g mouse specific fields or 10x specific fields).These schemas are individually versioned following semantic versioning with major, minor and patch version numbers, e.g biomaterial_core is currently 8.2.0. What changes trigger what type of version increment is documented in https://github.com/HumanCellAtlas/metadata-schema/blob/master/docs/evolution.md#schema-versioning. The basic rule is major versions change if the change breaks backwards compatibility, minor changes are attribute changes which don’t break backwards compatibility and patch changes are for documentation changes or bug fixesMetadata Release Process: Schema changes are made via PRs into develop. Release from integration to develop does not follow a specific schedule and happens on Thursdays and Fridays not to interfere with the DCP-wide release schedule. Releases from integration to staging and from staging to prod follow the DCP-wide release schedule. Metadata release process is documented here: https://github.com/HumanCellAtlas/metadata-schema/blob/master/docs/release_process.md#steps-of-the-pre-release-process DCP-wide release SOP is here: https://allspark.dev.data.humancellatlas.org/dcp-ops/docs/wikis/SOP:%20Releasing%20new%20Versions%20of%20DCP%20Software This service does not need to track state. This service does not accept input This service does not transform its data This service distributes JSON schema (draft 7). The schema themselves are stored in git but they are released via https://schema.humancellatlas.org/ and the publishing process is operated via code in https://github.com/HumanCellAtlas/metadata-schema-publisher Any other DCP service which collects, processes, stores, queries or presents HCA data will create or read instances data which use the JSON schema https://github.com/HumanCellAtlas/metadata-schema/blob/master/README.md is a good starting point. It isn’t 100% complete, please reach out to Laura/Norman/Mark if you identify clear gaps.
Ingest The Ingest Service is responsible for the intake, validation of metadata and data (thru the Upload service) and persisting it into the Data Storage System (DSS) of Human Cell Atlas (HCA) Data Coordination Platform (DCP). Ingest Core, Ingest State Tracking, Ingest Validator, Ingest Client, Ingest UI, Ingest Broker, Ingest Staging Manager, Ingest Exporter, Ingest Deployment Ingest HAL Browser, Primary Submission, Secondary Submission Missing Missing Missing Lucid Chart Data Model Missing Missing Missing Missing Missing Missing, Note - Ingest creates the graph immediately from any user supplied content. It uses the graph to calculate the contents required to be added to any created bundles and serializes the graph into links.json by walking through the ingest API after validation and upon submission
Upload
DataStore (DSS)
Secondary Analysis
Azul
DataBrowser
Matrix Service
Query Service
Authentication and Authorization
Clone this wiki locally