Skip to content

Pipeline for kmer (oligo)-based genome-wide association studies

License

Notifications You must be signed in to change notification settings

danny-wilson/kmer_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmer_pipeline

Pipeline for kmer (oligo)-based genome-wide association studies

Implementing methods described in

Genome-wide association studies of global Mycobacterium tuberculosis resistance to thirteen antimicrobials in 10,228 genomes The CRyPTIC Consortium (2021) PLOS Biology 20: e3001755 (article)

Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Earle, S. G., Wu, C.-H., Charlesworth, J., Stoesser, N., Gordon, N. C., Walker, T. M., Spencer, C. C. A., Iqbal, Z., Clifton, D. A., Hopkins, K. L., Woodford, N., Smith, E. G., Ismail, N., Llewelyn, M. J., Peto, T. E., Crook, D. W., McVean, G., Walker, A. S. and D. J. Wilson (2016) Nature Microbiology 1: 16041 (preprint)

Authors

Written by Sarah G Earle and Daniel J Wilson at the University of Oxford, Big Data Institute.

Funding

This project was supported by the Wellcome Trust, the Royal Society and the Robertson Foundation.

Dependencies

The pipeline utilizes software including GEMMA, DSK, NCBI BLAST, MUMmer, Bowtie2, Samtools, the GNU Scientific Library, the Automatically Tuned Linear Algebra Software

The container implements a Nextflow pipeline which runs on top of Apache Groovy and Java.

Original code and scripts were also written in R and C++, using the genoPlotR library and the zstr library, a C++ wrapper for the zlib library. C++ code was compiled with the GNU Compiler Collection.

The container was written using Docker and based on the Jupyter Data Science Notebook.

License

The zstr headers are licensed under the MIT license. The myutils headers are licensed under the GNU Lesser General Public License Version 3. All other code is licensed under the GNU General Public License v3.0.

Installation

To download a prebuilt Docker image

docker pull dannywilson/kmer_pipeline:2022-10-26

To build a Singularity container

singularity pull -F docker://dannywilson/kmer_pipeline:2022-10-26

Running the Nextflow pipeline

For instructions on running the Nextflow pipeline, including the Mycobacterium tuberculosis example, download the manual.

Running the Jupyter Data Science Notebook

To launch as a Jupyter Data Science Notebook using Docker

docker container run --name kmer -p 8888:8888 -v $MNT_DIR:/home/jovyan dannywilson/kmer_pipeline:latest

where $MNT_DIR represents the local directory outside the container you wish to access inside the container from /home/jovyan. Having launched the container, navigate to the user interface in a web browser by following one of the URLs provided at the command line.