BCS Pipeline Documentation

Introduction

This README serves as a virtual lab notebook. We will document the scripts and the order that they're run in on here. This pipeline is built for use on a server that utilizes SLURM. I'm currently in the process of cleaning up this repo and improving documentation, so expect things to be incomplete but improving.

The repo is in the middle of undergoing a reorganization to where the scripts should be run in numeric order for simplicity. Expect there to be irrelevant bits that haven't been cleaned up yet.

Required software

flexbar
R 3.4.2 or later
- DADA2
- vegan
- LULU OTU curation package
- Plotly R package (properly configured for online use)
- phyloseq
- stringr
- ggplot2
- DESeq2
- parallel
- breakaway
RDP Classifier
- Java
BLCA classifier
swarm2
VSEARCH
Drive5 python scripts, place in a folder named d5_py within this folder.
...likely more. To be added soon.

Setup

Folder setup

Make a main project folder. Inside make subdirectories for each run (e.g. BCS1, BCS2, BCS3, BCS4, etc). The raw reads (FASTQ or FASTQ.GZ) can go into these subdirectories. Unzip any FASTQ.GZ files to maintain consistency across different folders. In the future, this step doesn't need to be done (.GZ files preferable), but minor changes need to be made to the flexbar scripts to accomodate this. The main directory should also contain a tab-separated value file called key.txt that contains information corresponding library number, adapter index number (sample number from the Illumina run), and primer tag number to ML ID numbers and other sample metadata. Also in the main project folder, make another plaintext file call libraries.txt that contains the names of each of the run subdirectories (e.g. BCS1, BCS2, BCS3, etc), which one name on each line.

Primer barcodes

Make a FASTA file (or multiple FASTA files if multiple barcoding schemes are used) containing the primer index barcodes in the main project folder. This is called ML_barcodes.fasta in these scripts.

Adapter FASTA file

You will need to make a similar FASTA file containing all of the adapters that you wish flexbar to trim for. Here, it is called truseq_adapters.fasta.

Script setup

This repository can be cloned into its own folder inside of the main project folder (name it "scripts" or something). Inside of your scripts folder, make a subdirectory called out_err_files to contain log files and error logs.

Run Order

make_file_root_list.sh
flexbar.sh
flexbar2.sh

Cleanup

Move all unassigned reads into a new subdirectory called unassigned, if any are remaining for whatever reason after running flexbar2.sh. This should be performed automatically by the script now.

R Analysis with DADA2 and RDP

Description

The phyloseq.sh and BCS_phyloseq.R scripts run community analyses and visualization (based on the Plotly tool). You may need to make major changes to this section for your own analysis or configure Plotly if you want to use the existing visualizations. For some reason, the RDP implementation in DADA2 doesn't perform well for us, possibly due to memory or scaling issues. We run RDP outside of R/DADA2 to get around this.

Setup

R scripts can be placed in the same folder as the other scripts (in a folder inside of the folders with all the directories containing the sequenced libraries). Code for renaming files generated by flexbar2.sh will be contained here as well as R code for running DADA2 analysis.

Run Order

rename.sh
filtertrim.sh
removechim.sh
RDP_classify.sh
phyloseq.sh

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
00_make_file_root_list.sh		00_make_file_root_list.sh
01_flexbar.sh		01_flexbar.sh
02_flexbar2.sh		02_flexbar2.sh
03_rename.sh		03_rename.sh
04_filtertrim.sh		04_filtertrim.sh
05_removechim.sh		05_removechim.sh
06_LULU_prep.sh		06_LULU_prep.sh
07_BLCA.sh		07_BLCA.sh
08_relabel_blca.sh		08_relabel_blca.sh
09_vsearch_cluster.sh		09_vsearch_cluster.sh
09b_swarm_cluster.sh		09b_swarm_cluster.sh
09c_vsearch_cluster95.sh		09c_vsearch_cluster95.sh
10_gen_matchlist.sh		10_gen_matchlist.sh
10b_gen_matchlist_swarm.sh		10b_gen_matchlist_swarm.sh
10c_gen_matchlist_vs95.sh		10c_gen_matchlist_vs95.sh
11_run_LULU.sh		11_run_LULU.sh
11b_run_LULU_swarm.sh		11b_run_LULU_swarm.sh
11c_run_LULU_vs95.sh		11c_run_LULU_vs95.sh
BCS_phyloseq.R		BCS_phyloseq.R
BLCAb70.sh		BLCAb70.sh
LICENSE		LICENSE
LULU.R		LULU.R
LULU_prep.R		LULU_prep.R
LULU_swarm.R		LULU_swarm.R
LULU_vs95.R		LULU_vs95.R
OTU_contingency_table_simple.py		OTU_contingency_table_simple.py
RDP_classify.sh		RDP_classify.sh
README.md		README.md
dada2_analysis.R		dada2_analysis.R
dada2_removechim.R		dada2_removechim.R
filtertrim.sh		filtertrim.sh
flexbar.sh		flexbar.sh
flexbar2.sh		flexbar2.sh
make_file_root_list.sh		make_file_root_list.sh
phyloseq.sh		phyloseq.sh
relabel_blca.R		relabel_blca.R
removechim.sh		removechim.sh
rename.sh		rename.sh
rename_postQC.R		rename_postQC.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BCS Pipeline Documentation

Introduction

Required software

Setup

Folder setup

Primer barcodes

Adapter FASTA file

Script setup

Run Order

Cleanup

R Analysis with DADA2 and RDP

Description

Setup

Run Order

About

Releases

Packages

Languages

License

Talitrus/BCS_dada2

Folders and files

Latest commit

History

Repository files navigation

BCS Pipeline Documentation

Introduction

Required software

Setup

Folder setup

Primer barcodes

Adapter FASTA file

Script setup

Run Order

Cleanup

R Analysis with DADA2 and RDP

Description

Setup

Run Order

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages