N.B: As of 21st of February, 2020 shapeit=v2.r837 has been removed from the bioconda channel. Hence, I have removed it from the environment file. However, shapeit=2.r904 is available from https://anaconda.org/dranew/shapeit and if you want to use it add it to the environment file with the appropriate dranew channel. Runs with shapeit4 will still work as it is still available on bioconda.

HLA and KIR Imputation from SNP

Authors

Bisrat Johnathan Debebe (@bjohnnyd)

If the above command is executed with the config.yaml settings from Quickstart only liftover and encoding will be performed. This allows inspection of the variants in the VCF by looking at the images produced in output/{project}/kirimp/01_freq_encode_snps/{project}.png. The images might help in deducing what threshold to use during SHAPEIT for filtering troublesome variants with missing genotypes.

Run SHAPEIT After Checking VCF Statistics

To run SHAPEIT and prepare data for KIR*IMP using the file produced by running snakemake -j<number of parallel jobs> --use-conda kirimp_encode run the command:

snakemake -j<number of parallel jobs> --use-conda kirimp_ready

In addition, SHAPEIT can be run on a specific VCF (not necessarily produced by the workflow) by specifying the vcf file under the shapeit directive in config.yaml:

project:
  <project name>:
    shapeit
      vcf: <Path to VCF file>

Additional and Default Parameters

The default parameters settings and there descriptions are listed below and can be modified by placing it in config.yaml with the desired values different from defaults:

BCFTOOLS_THREADS: 4
KIRIMP_PANEL_URL: http://imp.science.unimelb.edu.au/kir/static/kirimp.uk1.snp.info.csv
PLINK_THREADS: 4
SHAPEIT_GENMAP_URL: https://github.com/odelaneau/shapeit4/blob/master/maps/genetic_maps.b37.tar.gz?raw=true
SHAPEIT_THREADS: 4

project:
  <project name>:
    freq_encode_snps:
      vcf: output/{project}/liftover/04_hg19_vcf/{project}.{reference}ToHg19.vcf.gz
      additional: "--outlier-threshold 0.1" # additional parameters to be passed to the script in 'scripts/frequency_encode_snps.py'.
    liftover:
      reference: hg18 # Input PLINK bed reference can be either hg16, hg17, hg18 or hg38
    shapeit:
      vcf: output/{project}/kirimp/01_freq_encode_snps/{project}.vcf.gz
      gmap: input/meta/shapeit/kirimp.chr19.gmap.txt.gz
      pbwt: 8 # shapeit4: determines the number of conditioning neighbours, higher number should produce better accuracy but slower runtimes
      pbwt-modulo: 8 # shapeit4: determines variants for storing pbwt indexes
      regions: [19] # chromosomes for phasing default is 19 for KIR*IMP
      states: 500 # shapeit2: determines number of haplotypes for conditioning, higher number should produce better accuracy but slower runtimes
      min_missing: 0.25 # variants with missing genotype rate larger than this will be discarded before phasing with SHAPEIT. Inspect the image from kirimp_encode to determine threshold
      v2_additional: "" # Additional parameters for shapeit version 2.  Description of possible settings can be found at https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html
      v4_additional: "" # Additional parameters for shapeit version 4.  Description of possible settings can be found at https://odelaneau.github.io/shapeit4/

Workflow Running Options

Test your configuration by performing a dry-run via

snakemake --use-conda -npr

Execute the workflow locally via

snakemake --use-conda -j $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

or if DRMAA is available

snakemake --use-conda --drmaa --jobs 100

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above. See the Snakemake documentation for further details.

Investigate results

After successful execution, you can create a self-contained interactive HTML report with all results via:

snakemake --report report.html

Standalone Script

The workflow includes a standalone script

scripts/frequency_encode_snps.py

which can be used to encode a VCF file to a reference panel of variants. To setup the environment for the script create the environment:

conda env create -f envs/freq_encode_snps.yml

then activate the environment with

conda activate freq_encode_snps

The parameters to run the standalone script are:

usage: frequency_encode_snps.py [-h] -v VCF_FILE -r REFERENCE_PANEL
                                [-rt {kirimp,custom}] [--separator SEPARATOR]
                                -o OUTPUT [-chr [CHROMOSOMES]]
                                [--chromosome-annotation {ucsc,ensembl,genbank}]
                                [-a] [-c] [-min MIN_AMBIGIOUS_THRESHOLD]
                                [-max MAX_AMBIGIOUS_THRESHOLD]
                                [--outlier-threshold OUTLIER_THRESHOLD]
                                [-t THREADS]

For usage with KIR*IMP the VCF has to be with hg18/b37 reference.

The script in addition to encoding the VCF file will produce a diagnosis plot (same name as output vcf file but with .png extension) as well as a TOML file containing all the metadata of the encoding. The metadata TOML can be read into R using configr

require(configr)
read.config(file = "metadata.toml")

or into Python using

import toml
toml.load("metadata.toml")

the resulting vcf will contain statistics from the encoding inside the INFO fields:

##INFO=<ID=UPD,Number=A,Type=Flag,Description="REF and ALT updated based on reference panel frequency">
##INFO=<ID=PFD,Number=A,Type=Float,Description="Alternate frequency difference to reference panel frequency">
##INFO=<ID=MISS,Number=A,Type=Float,Description="Missing Genotype Frequency">
##INFO=<ID=MAF,Number=A,Type=Float,Description="Minor Allele Frequency">

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HLA and KIR Imputation from SNP

Authors

Table of Contents

Usage

Workflow Overview

Quickstart

Running Specific Parts or Running Workflow in Steps

Running Only Liftover

Encode VCF Without Liftover or Perform Liftover and Encoding Without Phasing With SHAPEIT

Run SHAPEIT After Checking VCF Statistics

Additional and Default Parameters

Workflow Running Options

Investigate results

Standalone Script

Files

README.md

Latest commit

History

README.md

File metadata and controls

HLA and KIR Imputation from SNP

Authors

Table of Contents

Usage

Workflow Overview

Quickstart

Running Specific Parts or Running Workflow in Steps

Running Only Liftover

Encode VCF Without Liftover or Perform Liftover and Encoding Without Phasing With SHAPEIT

Run SHAPEIT After Checking VCF Statistics

Additional and Default Parameters

Workflow Running Options

Investigate results

Standalone Script