Pipeline for kmer (oligo)-based genome-wide association studies
Implementing methods described in
Genome-wide association studies of global Mycobacterium tuberculosis resistance to thirteen antimicrobials in 10,228 genomes The CRyPTIC Consortium (2021) PLOS Biology 20: e3001755 (article)
Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Earle, S. G., Wu, C.-H., Charlesworth, J., Stoesser, N., Gordon, N. C., Walker, T. M., Spencer, C. C. A., Iqbal, Z., Clifton, D. A., Hopkins, K. L., Woodford, N., Smith, E. G., Ismail, N., Llewelyn, M. J., Peto, T. E., Crook, D. W., McVean, G., Walker, A. S. and D. J. Wilson (2016) Nature Microbiology 1: 16041 (preprint)
Written by Sarah G Earle and Daniel J Wilson at the University of Oxford, Big Data Institute.
This project was supported by the Wellcome Trust, the Royal Society and the Robertson Foundation.
The pipeline utilizes software including GEMMA, DSK, NCBI BLAST, MUMmer, Bowtie2, Samtools, the GNU Scientific Library, the Automatically Tuned Linear Algebra Software
The container implements a Nextflow pipeline which runs on top of Apache Groovy and Java.
Original code and scripts were also written in R and C++, using the genoPlotR library and the zstr library, a C++ wrapper for the zlib library. C++ code was compiled with the GNU Compiler Collection.
The container was written using Docker and based on the Jupyter Data Science Notebook.
The zstr headers are licensed under the MIT license. The myutils headers are licensed under the GNU Lesser General Public License Version 3. All other code is licensed under the GNU General Public License v3.0.
To download a prebuilt Docker image
docker pull dannywilson/kmer_pipeline:2022-10-26
To build a Singularity container
singularity pull -F docker://dannywilson/kmer_pipeline:2022-10-26
For instructions on running the Nextflow pipeline, including the Mycobacterium tuberculosis example, download the manual.
To launch as a Jupyter Data Science Notebook using Docker
docker container run --name kmer -p 8888:8888 -v $MNT_DIR:/home/jovyan dannywilson/kmer_pipeline:latest
where $MNT_DIR represents the local directory outside the container you wish to access inside the container from /home/jovyan. Having launched the container, navigate to the user interface in a web browser by following one of the URLs provided at the command line.