You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.
Interop Contact:
Active in 2021: Active Researchers: Xihong Lin (Harvard T.H. Chan School of Public Health)
Analysis Question:
Large-scale Whole Genome Sequencing (WGS) studies and biobanks have been rapidly generating up to millions of whole genomes. Examples of large-scale WGS studies include the NHGRI Genome Sequencing Program (GSP), which has sequenced 140,000+ multi-ethnic whole genomes and 220,000 whole exomes, and the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program, which has sequenced 190,000+ multi-ethnic whole genomes.
Analysis of WGS data is challenged by massive coding and non-coding rare variants (RVs) and the need to functionally annotate these variants. We recently developed a whole-genome variant functional annotation database and portal FAVOR that assembles rich functional annotations from a variety of data sources to describe the functional landscape and regulatory characteristics of variants from large-scale WGS data. We also developed a novel RV association test STAAR that empowers the RV association analysis by effectively incorporating multi-faceted functional annotations provided by FAVOR.
This project aims to develop a comprehensive cloud-based open-source rare variant analysis toolset to perform powerful, scalable, and resource-efficient functional annotations and phenotype-genotype rare variant association studies.
First, we will develop an open-source pipeline, FAVORannotator, for functionally annotating and efficiently storing the genotype and variant functional annotation data of a WGS/biobank study in an all-in-one file format to facilitate downstream RV association analysis.
Second, we will provide an all-in-one and open-source cloud-based pipeline, STAARpipeline, for comprehensive and scalable rare variant association analysis and summary of large-scale WGS and Biobank data using STAAR by integrating variant functional annotations provided by the FAVOR annotator, and visualization of the RV association results.
Analysis Plan:
We have obtained IRB approval for the TOPMed dataset and GSP dataset.
We have obtained dbGaP access to these studies.
Develop functional annotation pipeline, FAVORannotator, in Biodata Catalyst and AnVIL using the Terra platform.
Develop RV association analysis pipeline, STAARpipeline, in Biodata Catalyst and AnVIL using the Terra platform.
Functionally annotate TOPMed Freeze 8 and GSP Freeze 2 data using FAVORannotator
Perform association analysis of TOPMed Freeze 8 and GSP Freeze 2 CAD data using STAARpipeline.
Store WGS common and rare variant summary statistics of TOPMed Freeze 8 lipids and GSP Freeze 2 CAD in STAARsummary.
The text was updated successfully, but these errors were encountered:
Updates: Met with Xihong and Michael S on July 1,2021. Identified potential cloud cost resource for the project.
However, the interoperability use case still needs to be identified within this research project.
The PI is currently working on funds to support the implementation of FAVORannotator and STAARpipeline in AnVIL. One possibility is to use GCP $300 credits to try-out.
Interop Contact:
Active in 2021: Active
Researchers: Xihong Lin (Harvard T.H. Chan School of Public Health)
Analysis Question:
Large-scale Whole Genome Sequencing (WGS) studies and biobanks have been rapidly generating up to millions of whole genomes. Examples of large-scale WGS studies include the NHGRI Genome Sequencing Program (GSP), which has sequenced 140,000+ multi-ethnic whole genomes and 220,000 whole exomes, and the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program, which has sequenced 190,000+ multi-ethnic whole genomes.
Analysis of WGS data is challenged by massive coding and non-coding rare variants (RVs) and the need to functionally annotate these variants. We recently developed a whole-genome variant functional annotation database and portal FAVOR that assembles rich functional annotations from a variety of data sources to describe the functional landscape and regulatory characteristics of variants from large-scale WGS data. We also developed a novel RV association test STAAR that empowers the RV association analysis by effectively incorporating multi-faceted functional annotations provided by FAVOR.
This project aims to develop a comprehensive cloud-based open-source rare variant analysis toolset to perform powerful, scalable, and resource-efficient functional annotations and phenotype-genotype rare variant association studies.
First, we will develop an open-source pipeline, FAVORannotator, for functionally annotating and efficiently storing the genotype and variant functional annotation data of a WGS/biobank study in an all-in-one file format to facilitate downstream RV association analysis.
Second, we will provide an all-in-one and open-source cloud-based pipeline, STAARpipeline, for comprehensive and scalable rare variant association analysis and summary of large-scale WGS and Biobank data using STAAR by integrating variant functional annotations provided by the FAVOR annotator, and visualization of the RV association results.
Analysis Plan:
The text was updated successfully, but these errors were encountered: