Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CMG Sequencing table data #1

Open
torstees opened this issue Sep 16, 2020 · 5 comments
Open

Add support for CMG Sequencing table data #1

torstees opened this issue Sep 16, 2020 · 5 comments

Comments

@torstees
Copy link
Collaborator

torstees commented Sep 16, 2020

For first round of integration testing with Kids First, we need the priority 1 fields from sequencing (and maybe 2). These fields include:

Priority 1

seq_filename
analyte_type
sequencing_assay
library_prep_kit_method
reference_genome_build
alignment_method
data_processing_pipeline
functional_equivalence_standard
date_data_generation

Priority 2

exome_capture_platform
capture_region_bed_file

Our solution will likely be mostly borrowed from the discussion of kidsfirst-sequence-experiment by the Kids First team

@katiebanaz
Copy link
Collaborator

Test comment

@torstees
Copy link
Collaborator Author

In order to establish uniqueness, based on a quick look at the data, it seems safe to base a sequencing object on the filename.

@torstees
Copy link
Collaborator Author

torstees commented Sep 17, 2020

First pass is complete with a handful of general assignments, primarily borrowed from the KF docs of similar data.

resourceType: Task
owner => Sequencing Center (Organization)
authoredOn => date_data_generation

The majority of the details can be found in either of the input or output arrays. Currently, most of these are simple strings, but they can be switched to codes once we have a clear terminology to use. Which vars go into input vs out is largely arbitrary, but I believe the KF team were thinking along the lines of what goes into the actual genotyping device rather than considering the concept of a black box in which samples go in and final products come out. So, input to the actual pipelines are currently sitting in the output array.

Inputs:

  • Sample
  • Analyte Type
  • Library Prep Kit
  • Exome Capture Platform
  • Capture Region Bed File

Outputs:

  • Reference Genome Build
  • Alignment Method
  • Data Processing Pipeline
  • Functional Equivalence Standard (I honestly have no idea what this is)
`  {
    "host": "http://localhost:8000",
    "type": "sequencing_data",
    "body": {
      "resourceType": "Task",
      "id": "38924.merged.matefixed.sorted.markeddups.recal.bam",
      "status": "completed",
      "description": "Generate sequence data for use by researchers",
      "owner": {
        "reference": "Organization/FD",
        "display": "FD"
      },
      "meta": {
        "profile": [
          "http://hl7.org/fhir/StructureDefinition/Task"
        ]
      },
      "identifier": [
        {
          "system": "urn:ncpi:unique-string",
          "value": "Task|38924.merged.matefixed.sorted.markeddups.recal.bam"
        }
      ],
      "output": [
        {
          "type": {
            "text": "Reference Genome Build"
          },
          "valueString": "GRCh38DH"
        },
        {
          "type": {
            "text": "Alignment Method"
          },
          "valueString": "bwa-0.7.15"
        },
        {
          "type": {
            "text": "Data Processing Pipeline"
          },
          "valueString": "3.0_DNA_Pipeline"
        },
        {
          "type": {
            "text": "Functional Equivalence Standard"
          },
          "valueBoolean": "false"
        }
      ],
      "input": [
        {
          "type": {
            "text": "Sample"
          },
          "valueReference": {
            "reference": "Specimen/4774"
          }
        },
        {
          "type": {
            "text": "Analyte Type"
          },
          "valueString": "DNA"
        },
        {
          "type": {
            "text": "Library Prep Kit"
          },
          "valueString": "DNA_3.0_library_prep"
        },
        {
          "type": {
            "text": "Exome Capture Platform"
          },
          "valueString": "nimblegen_solution_bigexome_2011"
        },
        {
          "type": {
            "text": "Capture Region Bed File"
          },
          "valueString": "nimblegen_solution_bigexome_2011.hg19.list.bed"
        }
      ],
      "authoredOn": "2016-05-26"
    }
  },`

@torstees
Copy link
Collaborator Author

Closed by accident

@torstees
Copy link
Collaborator Author

torstees commented Oct 9, 2020

Things have changed since this was originally described, largely as a result of further discussions with the folks from the KF team. For our current use, due to the small number of attributes, all of the input still reasonably apply to the Sequencing Task itself, however, the output has been stripped except for the actual Document Reference, which represents the actual biproduct of the sequencing process. We then attach an Observation onto that Doc Ref which contains various components describing the contents of that document, such as the Reference Sequence, Alignment Method, etc.

seq-data-graphic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants