UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

NoopDog · 2021-06-29T21:54:02Z

Interop Contact:
Active in 2021: Active
Researchers: Xihong Lin (Harvard T.H. Chan School of Public Health)

Analysis Question:

Large-scale Whole Genome Sequencing (WGS) studies and biobanks have been rapidly generating up to millions of whole genomes. Examples of large-scale WGS studies include the NHGRI Genome Sequencing Program (GSP), which has sequenced 140,000+ multi-ethnic whole genomes and 220,000 whole exomes, and the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program, which has sequenced 190,000+ multi-ethnic whole genomes.

Analysis of WGS data is challenged by massive coding and non-coding rare variants (RVs) and the need to functionally annotate these variants. We recently developed a whole-genome variant functional annotation database and portal FAVOR that assembles rich functional annotations from a variety of data sources to describe the functional landscape and regulatory characteristics of variants from large-scale WGS data. We also developed a novel RV association test STAAR that empowers the RV association analysis by effectively incorporating multi-faceted functional annotations provided by FAVOR.

This project aims to develop a comprehensive cloud-based open-source rare variant analysis toolset to perform powerful, scalable, and resource-efficient functional annotations and phenotype-genotype rare variant association studies.

First, we will develop an open-source pipeline, FAVORannotator, for functionally annotating and efficiently storing the genotype and variant functional annotation data of a WGS/biobank study in an all-in-one file format to facilitate downstream RV association analysis.

Second, we will provide an all-in-one and open-source cloud-based pipeline, STAARpipeline, for comprehensive and scalable rare variant association analysis and summary of large-scale WGS and Biobank data using STAAR by integrating variant functional annotations provided by the FAVOR annotator, and visualization of the RV association results.

Analysis Plan:

We have obtained IRB approval for the TOPMed dataset and GSP dataset.
We have obtained dbGaP access to these studies.
Develop functional annotation pipeline, FAVORannotator, in Biodata Catalyst and AnVIL using the Terra platform.
Develop RV association analysis pipeline, STAARpipeline, in Biodata Catalyst and AnVIL using the Terra platform.
Functionally annotate TOPMed Freeze 8 and GSP Freeze 2 data using FAVORannotator
Perform association analysis of TOPMed Freeze 8 and GSP Freeze 2 CAD data using STAARpipeline.
Store WGS common and rare variant summary statistics of TOPMed Freeze 8 lipids and GSP Freeze 2 CAD in STAARsummary.

linikujp · 2021-07-01T18:09:23Z

Updates: Met with Xihong and Michael S on July 1,2021. Identified potential cloud cost resource for the project.
However, the interoperability use case still needs to be identified within this research project.

jackDiGi · 2021-07-13T12:30:11Z

meeting scheduled 20 July to resolve remaining issues and finalize

linikujp · 2021-08-16T22:11:18Z

The PI is currently working on funds to support the implementation of FAVORannotator and STAARpipeline in AnVIL. One possibility is to use GCP $300 credits to try-out.

linikujp · 2022-11-07T21:16:50Z

Decided to make this case to be inactive as there is no funding to support continuous development.

NoopDog added the Epic label Jun 29, 2021

NoopDog changed the title ~~UC 12 - 1NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP~~ UC 12 - NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP Jun 29, 2021

linikujp changed the title ~~UC 12 - NHLBI BioData Catalyst + NHGRI AnVIL + TOPMed + GSP~~ UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline Jun 29, 2021

linikujp mentioned this issue Jul 1, 2021

Discuss interoperability with Sys. Interop. group #14

Closed

NoopDog removed the Epic label Sep 23, 2021

linikujp added the one pager done label when use case's one pager is completed label Sep 23, 2021

linikujp added the implementation phase label Oct 6, 2021

jackDiGi added the SYS INTEROP System interoperability use case label Nov 16, 2021

NoopDog added this to NCPI Use Case Tracker Dec 3, 2021

NoopDog moved this to On Hold in NCPI Use Case Tracker Dec 3, 2021

NoopDog removed one pager done label when use case's one pager is completed implementation phase labels Jan 30, 2022

linikujp added the inactive This use case is not being worked on. label Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

NoopDog commented Jun 29, 2021

linikujp commented Jul 1, 2021

jackDiGi commented Jul 13, 2021

linikujp commented Aug 16, 2021

linikujp commented Nov 7, 2022

UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

UC 12 - (Xihong) Whole Genome Sequencing Association Analysis pipeline #12

Comments

NoopDog commented Jun 29, 2021

Analysis Question:

Analysis Plan:

linikujp commented Jul 1, 2021

jackDiGi commented Jul 13, 2021

linikujp commented Aug 16, 2021

linikujp commented Nov 7, 2022