Skip to content
/ PCP Public

PrecisionCallerPipeline (PCP) automatically takes Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) FASTQ files and outputs BAM files correctly aligned to the rCRS.

License

Notifications You must be signed in to change notification settings

filcfig/PCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The PrecisionCallerPipeline (PCP)

The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.

Prerequisites

We use a workflow based on Snakemake in a Linux-based system with:

  • Awk, for SAM file editing;
  • BEDTools, for BAM to FASTQ conversion;
  • BWA-MEM, for read alignment;
  • Pycision, for amplicon delimitation and selection;
  • RtN!, for NUMT removal;
  • SAMtools, for BAM conversion, sorting, indexing, and merging;
  • Trimmomatic, for read quality control and trimming.

Installation

Install the software above and clone this repo to your directory of choice:

git clone https://github.com/filcfig/PCP.git

Add pycision.py, trimmomatic-0.39.jar, and the RtN folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa) to the tools folder.

Usage

Start by adding the FASTQ files to the sequencing/selected_fastqfiles folder. Then, make run_FASTQ.sh executable and run it (make sure Snakemake is activated - if you use conda, type conda activate snakemake):

chmod +x run_FASTQ.sh
./run_FASTQ.sh

Since running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN, by running Snakefile_noRtN instead:

snakemake -s Snakefile_noRtN -j

The final BAM files will be available at the sequencing/merged folder.

Data

The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.

Citation

Our manuscript is published at:

Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.

License

Distributed under the MIT License. See LICENSE for more information.

About

PrecisionCallerPipeline (PCP) automatically takes Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) FASTQ files and outputs BAM files correctly aligned to the rCRS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages