Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

Open
NoopDog opened this issue Jun 29, 2021 · 4 comments
Open
Labels
inactive This use case is not being worked on. SYS INTEROP System interoperability use case

Comments

@NoopDog
Copy link
Collaborator

NoopDog commented Jun 29, 2021

Interop Contact:
Active in 2021: Active
Researchers: Xihong Lin (Harvard T.H. Chan School of Public Health)

Analysis Question:

Large-scale Whole Genome Sequencing (WGS) studies and biobanks have been rapidly generating up to millions of whole genomes. Examples of large-scale WGS studies include the NHGRI Genome Sequencing Program (GSP), which has sequenced 140,000+ multi-ethnic whole genomes and 220,000 whole exomes, and the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program, which has sequenced 190,000+ multi-ethnic whole genomes.

Analysis of WGS data is challenged by massive coding and non-coding rare variants (RVs) and the need to functionally annotate these variants. We recently developed a whole-genome variant functional annotation database and portal FAVOR that assembles rich functional annotations from a variety of data sources to describe the functional landscape and regulatory characteristics of variants from large-scale WGS data. We also developed a novel RV association test STAAR that empowers the RV association analysis by effectively incorporating multi-faceted functional annotations provided by FAVOR.

This project aims to develop a comprehensive cloud-based open-source rare variant analysis toolset to perform powerful, scalable, and resource-efficient functional annotations and phenotype-genotype rare variant association studies.

First, we will develop an open-source pipeline, FAVORannotator, for functionally annotating and efficiently storing the genotype and variant functional annotation data of a WGS/biobank study in an all-in-one file format to facilitate downstream RV association analysis.

Second, we will provide an all-in-one and open-source cloud-based pipeline, STAARpipeline, for comprehensive and scalable rare variant association analysis and summary of large-scale WGS and Biobank data using STAAR by integrating variant functional annotations provided by the FAVOR annotator, and visualization of the RV association results.

Analysis Plan:

  1. We have obtained IRB approval for the TOPMed dataset and GSP dataset.
  2. We have obtained dbGaP access to these studies.
  3. Develop functional annotation pipeline, FAVORannotator, in Biodata Catalyst and AnVIL using the Terra platform.
  4. Develop RV association analysis pipeline, STAARpipeline, in Biodata Catalyst and AnVIL using the Terra platform.
  5. Functionally annotate TOPMed Freeze 8 and GSP Freeze 2 data using FAVORannotator
  6. Perform association analysis of TOPMed Freeze 8 and GSP Freeze 2 CAD data using STAARpipeline.
  7. Store WGS common and rare variant summary statistics of TOPMed Freeze 8 lipids and GSP Freeze 2 CAD in STAARsummary.
@NoopDog NoopDog added the Epic label Jun 29, 2021
@NoopDog NoopDog changed the title UC 12 - 1NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP UC 12 - NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP Jun 29, 2021
@linikujp linikujp changed the title UC 12 - NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline Jun 29, 2021
@linikujp
Copy link
Member

linikujp commented Jul 1, 2021

Updates: Met with Xihong and Michael S on July 1,2021. Identified potential cloud cost resource for the project.
However, the interoperability use case still needs to be identified within this research project.

@jackDiGi
Copy link
Collaborator

meeting scheduled 20 July to resolve remaining issues and finalize

@linikujp
Copy link
Member

The PI is currently working on funds to support the implementation of FAVORannotator and STAARpipeline in AnVIL. One possibility is to use GCP $300 credits to try-out.

@NoopDog NoopDog removed the Epic label Sep 23, 2021
@linikujp linikujp added the one pager done label when use case's one pager is completed label Sep 23, 2021
@jackDiGi jackDiGi added the SYS INTEROP System interoperability use case label Nov 16, 2021
@NoopDog NoopDog moved this to On Hold in NCPI Use Case Tracker Dec 3, 2021
@NoopDog NoopDog removed one pager done label when use case's one pager is completed implementation phase labels Jan 30, 2022
@linikujp linikujp added the inactive This use case is not being worked on. label Nov 7, 2022
@linikujp
Copy link
Member

linikujp commented Nov 7, 2022

Decided to make this case to be inactive as there is no funding to support continuous development.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
inactive This use case is not being worked on. SYS INTEROP System interoperability use case
Projects
Status: On Hold
Development

No branches or pull requests

3 participants