Decontamination of the Microbiome Sequence data

Decontamination is a process of removing contaminant sequences (OTUs) from the biological / target samples sequences data. Potential sources of contaminants are the DNA isolation and purification reagents, sample storage media, sampling tools, laboratory environments and researchers. This module points some of the aspects to implemented in the wetlab as well as in the dtrylab for the efficient implementation of the deconatmanation process. Lastly, provides a script which can detect and remove contaminant sequence from the target microbome sequence data.

In the Wetlab

At least three replicate of a blank (or sampling media) spiked with known pure bacteria isolate (prefereably not expected in your study profile) should be prepared. the spiking concetration should be comparable to the concentration in the target biological sample. If there is high disperity of the DNA concentration levels betweem biological samples, then, spiked controls can as wellbe prepared at both the higher levels and lower levels. The objective is to introduce a relatively similar competition of the DNA from different OTUs during the amplification process. Run the DNA isolation, library preparation and sequencing in exactly the same ways as in the tareget biological samples.

Bionformatics processing

Run QC, OTU picking and the taxonomic annotation of the spiked controls and biological samples separatedly.
Check reproducibility of the controls replicates by comparing percentage of reads in each replicate
If sequence reads are comparable between replicates, then calculate the average sequence reads of each OTU detected
Remove the spiked OTUs (in most cases, it is the most abundant) retain only the contaminants / background sequences
Search for contaminant sequences in the target sample by comparing sequences of the background of the spiked control and the target biological sample
Remove detected contaminants OTUs/Sequences from the biological samples

Detection of contaminants from the target samples

The background sequences of the spiked control after removing the spiked bacteria sequences are aligned against the biological sample sequences to search for the possible matching OTUs/sequences. The script (detect.sh), also summarized below achieves this objective

"align_seq.py -i $inDir/conta.fa -o $outDir/decont100 -t $inDir/otus_prealigned.fa -m pynast -a uclust -e 250 -p 100 "

Whereby

conta.fa: Is a contaminant sequences from the background of the spiked control (if using uct-cbio 16S-rDNA nextflow pipeline this file is found in folder /otu_picking
decon100: Is a folder to which the output will be directed
otus_prealigned: Prealigned biological sample sequences (if using uct-cbio-16S-rDNA Nextflow; is the output in otu_processing/otus.align)
-e = 250: Align sequences at their entire length. i.e. 250bp
-p = 100: Percent sequence similarity between contaminant and the biological sample sequences
-m = pynast: Method for aligning sequences
-a =uclust: Method of performing pairwise sequence alignment in PyNAST

Output files

conta_aligned.fa: Is a fasta file of sequences aligned to the biological sample
conta_failures.fa: Is a fasta file of sequences which did not align to biological sample
conta_log.txt: Summary of contaminant OTUs which aligned to biological sample sequences.

Removing contaminant sequences from biological samples

Average reads of contaminant OTUs are subtracted from their respective mapped OTUs in the biological sample otu-table.txt (If you are using the uct-cbio-16S-rDNA Nextflow pipeline, the biological sample otu-table is found in folder /nextflow-outdir/otu_picking/). If the number of reads in the contaminant OTU is higher than in their respective OTU in the biological sample, then the entire OTU will be removed, otherwise, only the equilavent reads will subtracted. The script decont.sh is used for this purpose.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Examples		Examples
LICENSE		LICENSE
README.md		README.md
decont.sh		decont.sh
detect.qsub		detect.qsub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decontamination of the Microbiome Sequence data

In the Wetlab

Bionformatics processing

Detection of contaminants from the target samples

Output files

Removing contaminant sequences from biological samples

About

Releases

Packages

Languages

License

uct-cbio/16S-rDNA-decon-pipeline

Folders and files

Latest commit

History

Repository files navigation

Decontamination of the Microbiome Sequence data

In the Wetlab

Bionformatics processing

Detection of contaminants from the target samples

Output files

Removing contaminant sequences from biological samples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages