Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: tasks related to integration of Stager with G4RD ecosystem (Aim 1) #1021

Open
3 of 7 tasks
delvinso opened this issue Jan 19, 2022 · 0 comments
Open
3 of 7 tasks
Assignees

Comments

@delvinso
Copy link
Contributor

delvinso commented Jan 19, 2022

Note: These tasks are not a complete representation of what is required for full integration of Stager with the G4RD ecosystem. Additionally, not all tasks are tied to Stager's code base but they will be tracked here for convenience. Scripts are currently stored in this repo. Items 1 and 2 share similarities in that reports must first be separated into individual participants but the pre-processing may diverge due to differences in schema between the Stager variant database and Phenotips variant store. Items 2 and 3 have draft pull requests but are more or less in planning and testing phase.

G4RD

  1. Create script to post reports to PhenoTips variant store

The script should take a report csv path as input and do the following:

  • transform report into long format and separate into separate 'reports' for each participant

  • fetch an API token from PT using a client key and secret

  • post each participant-level report to the PT variant store endpoint

  • As brought up in sprint planning, what should happen if not all participants in an analysis have consented (and are therefore not in PT) come POSTing time?

  • Process and upload reports to as many patients as possible (may need to break out as separate issue)

    • For initial list of consented patients
      • Identify and fix error for ~3% (33) patient reports that failed upload
    • TBD

Stager

  1. Create script to post reports to Stager
  • The script should take a report csv path and an analysis ID (?) as input and do the following:
  • transform report into long format and separate into separate 'reports' for each participant
  • break up each participant-level 'report' into json that corresponds to the appropriate Stager data models (i.e., separate data -that belongs in the variant table from data that belongs in the genotype table).
  • fetch an API token from Stager using a client key and secret (?)
  • post the data to Stager (we might want to write an endpoint like api/analysis/{id}/report that wraps both inserts in a transaction)

Misc

  1. Flask: post analysis metadata to PhenoTips

We'd like to store analysis metadata in phenotips and associate it with a patient record. The data will be posted via the PT API with the participant's external id (<familyCodename>_<participantCodename>) used to fetch the internal id. The first step might be a click command that will post the following for a given participant (actual endpoint is not ready yet, so this is mostly a planning task). It needs to be determined what file metadata is available, because I think there might an interest in bams and crams as well as vcfs, csvs, etc. There might need to be some intermediate steps to get this data into Stager if it doesn't exist.

  • analysis_type
  • analysis_date
  • analysis_tissue_type
  • analysis_files[]
  • name
  • type
@kevinlul kevinlul changed the title Flask/Misc: Tasks related to integration of Stager with G4RD ecosystem Epic: tasks related to integration of Stager with G4RD ecosystem (Aim 1) Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant