HLA typing from WES data

An example of a simple NGS based HLA typing benchmarking study. Inspired by https://github.com/nikolasthuesen/hla-typing-benchmark

This project contains a Snakemake workflow that runs a full HLA typing pipeline, where WES samples from the 1000 Genomes project are HLA typed using Optitype, Kourami, HLA*LA and HISAT-genotype. The reference HLA typing is taken from DOI: 10.1371/journal.pone.0097282 specifically, http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140725_hla_genotypes/20140702_hla_diversity.txt

This is a simplified implementation of the original benchmarking study. This project does NOT include

HLA typing using STC-seq
A study of the impact of the depth of coverage of the whole-exome sequencing sample
An analysis of Optitype's performance on simulated ancient DNA.
A detailed gold standard dataset, where newer results from doi: 10.1371/journal.pone.0206512 and doi: 10.1093/nar/gkt481 are considered

Additionally, the pipeline is set up to HLA type and evaluate the results from only two of the individuals from the 1000 Genomes dataset whereas the original study included 829 samples. Including additional samples is, however, relatively easy as the config file (snakemake/config.yaml) can relatively easily be modified. For example using create_config.py

Software requirements

The HLA typing tools are called using Singularity which runs Docker images from https://hub.docker.com/. No installation of the specific HLA typing tools is therefore needed.

System requirements

At least 40 GB of memory is needed to run the most memory-heavy step which is indexing HLA*LA's graph structure.
The standard implementation of the Snakemake workflow currently uses 16 cores. If more/less is available, the Makefile can be easily modified.
The WES samples are relatively large and since they are saved as both both CRAM, BAM and FASTQ files in the pipeline, a significant amount of storage is needed. However, when the results are availble for a sample, intermediate files can be deleted. Alternatively, the Snakemake script can be modified by marking large intermediate files with "temp" (see https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#protected-and-temporary-files)

Usage

Note, that running the pipeline will take several hours per sample, so it is recommended to do it in a tmux

Clone the repository

git clone [email protected]:nikolasthuesen/springers-hla-benchmark.git

Install the required packages

cd springers-hla-benchmark
make install
source virt/bin/activate

Run Snakemake workflow.

For the full pipeline:

make run_benchmark

For a simplified, much quicker pipeline without HLA*LA:

make run_slim_benchmark

Plots and typing results will be found in the results folder, which is generated when running the Snakemake workflow.

If any job fails or if the Snakemake pipeline stalls, it is possible to clear the cache without having to re-download the containers by running:

make clear_snakemake_cache

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
reference_data		reference_data
snakemake		snakemake
src/hla_typing_benchmark		src/hla_typing_benchmark
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HLA typing from WES data

Software requirements

System requirements

Usage

About

Releases

Packages

Languages

nikolasthuesen/springers-hla-benchmark

Folders and files

Latest commit

History

Repository files navigation

HLA typing from WES data

Software requirements

System requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages