Add support for CMG Sequencing table data #1

torstees · 2020-09-16T15:52:31Z

For first round of integration testing with Kids First, we need the priority 1 fields from sequencing (and maybe 2). These fields include:

Priority 1

seq_filename
analyte_type
sequencing_assay
library_prep_kit_method
reference_genome_build
alignment_method
data_processing_pipeline
functional_equivalence_standard
date_data_generation

Priority 2

exome_capture_platform
capture_region_bed_file

Our solution will likely be mostly borrowed from the discussion of kidsfirst-sequence-experiment by the Kids First team

katiebanaz · 2020-09-16T17:27:54Z

Test comment

torstees · 2020-09-16T22:14:37Z

In order to establish uniqueness, based on a quick look at the data, it seems safe to base a sequencing object on the filename.

torstees · 2020-09-17T21:58:19Z

First pass is complete with a handful of general assignments, primarily borrowed from the KF docs of similar data.

resourceType: Task
owner => Sequencing Center (Organization)
authoredOn => date_data_generation

The majority of the details can be found in either of the input or output arrays. Currently, most of these are simple strings, but they can be switched to codes once we have a clear terminology to use. Which vars go into input vs out is largely arbitrary, but I believe the KF team were thinking along the lines of what goes into the actual genotyping device rather than considering the concept of a black box in which samples go in and final products come out. So, input to the actual pipelines are currently sitting in the output array.

Inputs:

Sample
Analyte Type
Library Prep Kit
Exome Capture Platform
Capture Region Bed File

Outputs:

Reference Genome Build
Alignment Method
Data Processing Pipeline
Functional Equivalence Standard (I honestly have no idea what this is)

`  {
    "host": "http://localhost:8000",
    "type": "sequencing_data",
    "body": {
      "resourceType": "Task",
      "id": "38924.merged.matefixed.sorted.markeddups.recal.bam",
      "status": "completed",
      "description": "Generate sequence data for use by researchers",
      "owner": {
        "reference": "Organization/FD",
        "display": "FD"
      },
      "meta": {
        "profile": [
          "http://hl7.org/fhir/StructureDefinition/Task"
        ]
      },
      "identifier": [
        {
          "system": "urn:ncpi:unique-string",
          "value": "Task|38924.merged.matefixed.sorted.markeddups.recal.bam"
        }
      ],
      "output": [
        {
          "type": {
            "text": "Reference Genome Build"
          },
          "valueString": "GRCh38DH"
        },
        {
          "type": {
            "text": "Alignment Method"
          },
          "valueString": "bwa-0.7.15"
        },
        {
          "type": {
            "text": "Data Processing Pipeline"
          },
          "valueString": "3.0_DNA_Pipeline"
        },
        {
          "type": {
            "text": "Functional Equivalence Standard"
          },
          "valueBoolean": "false"
        }
      ],
      "input": [
        {
          "type": {
            "text": "Sample"
          },
          "valueReference": {
            "reference": "Specimen/4774"
          }
        },
        {
          "type": {
            "text": "Analyte Type"
          },
          "valueString": "DNA"
        },
        {
          "type": {
            "text": "Library Prep Kit"
          },
          "valueString": "DNA_3.0_library_prep"
        },
        {
          "type": {
            "text": "Exome Capture Platform"
          },
          "valueString": "nimblegen_solution_bigexome_2011"
        },
        {
          "type": {
            "text": "Capture Region Bed File"
          },
          "valueString": "nimblegen_solution_bigexome_2011.hg19.list.bed"
        }
      ],
      "authoredOn": "2016-05-26"
    }
  },`

torstees · 2020-09-24T18:28:49Z

Closed by accident

torstees · 2020-10-09T18:25:39Z

Things have changed since this was originally described, largely as a result of further discussions with the folks from the KF team. For our current use, due to the small number of attributes, all of the input still reasonably apply to the Sequencing Task itself, however, the output has been stripped except for the actual Document Reference, which represents the actual biproduct of the sequencing process. We then attach an Observation onto that Doc Ref which contains various components describing the contents of that document, such as the Reference Sequence, Alignment Method, etc.

katiebanaz added the sequencing label Sep 24, 2020

torstees closed this as completed Sep 24, 2020

torstees reopened this Sep 24, 2020

torstees mentioned this issue Oct 9, 2020

Profile ResearchStudy to include DocumentReference NIH-NCPI/ncpi-model-forge#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CMG Sequencing table data #1

Add support for CMG Sequencing table data #1

torstees commented Sep 16, 2020 •

edited

Loading

katiebanaz commented Sep 16, 2020

torstees commented Sep 16, 2020

torstees commented Sep 17, 2020 •

edited

Loading

torstees commented Sep 24, 2020

torstees commented Oct 9, 2020 •

edited

Loading

Add support for CMG Sequencing table data #1

Add support for CMG Sequencing table data #1

Comments

torstees commented Sep 16, 2020 • edited Loading

Priority 1

Priority 2

katiebanaz commented Sep 16, 2020

torstees commented Sep 16, 2020

torstees commented Sep 17, 2020 • edited Loading

torstees commented Sep 24, 2020

torstees commented Oct 9, 2020 • edited Loading

torstees commented Sep 16, 2020 •

edited

Loading

torstees commented Sep 17, 2020 •

edited

Loading

torstees commented Oct 9, 2020 •

edited

Loading