Example code that demonstrates how to store, process, and query genomic and biological datasets using AWS HealthOmics
AWS HealthOmics helps healthcare and life sciences customers store, query, analyze, and generate insights from genomic and other biological data to improve human health.
This repository contains resources (e.g. code scripts, jupyter notebooks, etc) that demonstrate the usage of AWS HealthOmics.
The quickest setup to run example notebooks includes:
- An AWS account
- Proper IAM User and Role setup
- An Amazon SageMaker Notebook Instance
- Using HealthOmics Storage with genomics references and readsets: Get acquainted with HealthOmics storage by creating reference and sequence stores, importing data from FASTQ and CRAM files, and downloading readsets.
- Running WDL and Nextflow pipelines with HealthOmics Workflows: Learn how to create, run, and debug WDL and Nextflow based pipelines that process data from HealthOmics Storage and Amazon S3 using HealthOmics Workflows.
- Querying annotations and variants with HealthOmics Analytics: Get started with HealthOmics Analytics by importing variant and annotation data from VCF, TSV, and GFF files, and performing genome scale analysis queries using Amazon Athena.
This library is licensed under the Apache 2.0 License. For more details, please take a look at the LICENSE file.
See the Security issue notifications section of our contributing guidelines for more information.
Although we're extremely excited to receive contributions from the community, we're still working on the best mechanism to take in examples from external sources. Please bear with us in the short-term if pull requests take longer than expected or are closed. Please read our contributing guidelines if you'd like to open an issue or submit a pull request.