TBSeqPipe

Introduction

TBSeqPipe is a flexible and user-friendly pipeline based on snakemake workflow for analyzing WGS data of Mycobacterium tuberculosis complex isolates. Taking illumina WGS data as input, this workflow preforms some basic analysis tasks as well as some downstream high-level analysis steps. TBSeqPipe generates a final summary report to better integrate and present results from all analysis modules.

Workflow

Installation

Environment

Conda

Conda can function as a package manager and is available here. If you have conda make sure the bioconda and conda-forge channels are added:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Snakemake

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Detailed intsruction could be found here. Quick installation:

Install mamba first (mamba provides a faster and more roboust way for conda packages installation):

conda install -n base -c conda-forge mamba

Install snakemake using mamba:

conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake

Clone the repository

git clone [email protected]:KevinLYW366/TBSeqPipe.git

Activate the environment

conda activate snakemake

Kraken database

A pre-built 8 GB database MiniKraken DB_8GB is the suggested reference database for TBSeqPipe. It is constructed from complete bacterial, archaeal, and viral genomes in RefSeq.

Set up configuration

To run the complete workflow do the following:

Create an sample list file for all the samples you want to analyze with one ID per line.
Copy all FASTQ files of your samples into one directory.
Customize the workflow based on your need in: config/configfile.yaml. Parameters in "Required Parameters" section must be entered manually:
- sample_list: /path/to/sample_list_file
- data_dir: /path/to/fastq_files
- fastq_read_id_format, fastq_suffix_format and data_dir_format: give values based on the FASTQ file directory structure and the format of FASTQ file names
- kraken_db: /path/to/minikraken_20171019_8GB

Usage

Move to the directory of TBSeqPipe.

cd /path/to/TBSeqPipe

A dry-run is recommended at first to check if everything is okay.

snakemake -r -p -n

If no error message shows up, let's do a formal run (feel free to modify "-j 40" which controls the CPU cores used in parallel).

snakemake --use-conda -r -p -j 40

Note

Crashed and burned (Unlocking)

After the workflow was killed (Snakemake didn’t shutdown), the workflow directory will be still locked. If you are sure, that snakemake is no longer running (ps aux | grep snake).

Unlock the working directory:

snakemake *.snakemake --unlock

Rerun incomplete

If Snakemake marked a file as incomplete after a crash, delete and produce it again.

snakemake *.snakemake --ri

License

The code is available under the GNU GPLv3 license. The text and data are availabe under the CC-BY license.

Questions and Issues

For contacting the developer and issue reports please go to Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
flowchart		flowchart
resources		resources
workflow		workflow
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TBSeqPipe

Introduction

Workflow

Installation

Environment

Conda

Snakemake

Clone the repository

Activate the environment

Kraken database

Set up configuration

Usage

Note

Crashed and burned (Unlocking)

Rerun incomplete

License

Questions and Issues

About

Releases

Packages

Languages

License

KevinLYW366/TBSeqPipe

Folders and files

Latest commit

History

Repository files navigation

TBSeqPipe

Introduction

Workflow

Installation

Environment

Conda

Snakemake

Clone the repository

Activate the environment

Kraken database

Set up configuration

Usage

Note

Crashed and burned (Unlocking)

Rerun incomplete

License

Questions and Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages