This repository contains scripts and files used to analyze stranded paired-end SNS-Seq experiments for Stanojcic et al in prep.
The scripts are made to run on a HPC under slurm (Roscoff Bioinformatics platform ABiMS (http://abims.sb-roscoff.fr)).
The first script qc_mapping.sh is for read quality control and mapping if you have already mapped reads you can directly start with the second script finding_ORIs.sh, that will used aligned reads from stranded SNS-seq as input.
The first script qc_mqpping.sh uses raw paired-end stranded sequencing reads from SNS-seq experiments and performs quality check, adapter and quality trimming and read mapping.
- quality control of the provided fastq files (fastqc 0.11.9)
- trimming of tail and adapter sequences (cutadapt 4.0)
- quality trimming with q20 threshold (trimmomatic 0.39)
- alignment against provided indexed genome (bowtie2 2.4.1)
- sorting and conversion fom sam to bam of aligned reads (samtools 1.13)
- check for insert size (picard 2.23.5)
- removal of duplicated reads (picard 2.23.5)
- bam file indexing (samtools 1.13)
- generation of Multiqc report (multiqc 1.13)
The second script finding_ORIs.sh will use paired end aligend reads (sorted bam) as input and outputs bed files of the detected ORIs
- seperation of mapped reads in minus and plus strand (samtools 1.13)
- strand-seperated peak calling (macs2 2.2.7.1)
- peak filtering (bedtools 2.2.0.0 and awk) based on three criteria:
- distance of minus and plus strand peaks
- no complete overlap between plus and minus strand peaks
- right strand orientation (minus followed by plus)
Parameters for the qc_mqpping.sh script need to be provided in the config_mapping.txt file, that has to be in the same working directory as the script :
-
output directory name. The directory will be generated by the sript in the working directory.
output_dir=alignedReads
-
list of sample names as found in the fastq files e.g. $sample\L001_R1_001.fastq.gz
input_list=("x_S1_" "y_S2_" "z_S5_")
-
path to fastq directory
fastq_directory=path/to/fastqfiles
-
path and genome prefix of bowtie2 index and genome.fasta
genome=path/to/genome/myfavorite_Genome
-
genome prefix for filenames of mapped reads
genome_prefix=myfavorite_Genome
Parameters for the finding_ORIs.sh script need to be provided in the config_finding-ORIs.txt file, that has to be in the same working directory as the script:
-
path to aligned reads
read_directory=path/to/aligned_reads
-
output directory name (generated by the sript in the working directory)
output_dir=results/myresults
-
prefix of mapped reads ($sample$prefix.bam)
file_prefix=aln-pe_genomeprefix_sorted_reheadered_dups-removed
-
sample name as found in bam files
input_list=("x_S1_" "y_S2_" "z_S5_")
-
peak overlap to be excluded in percent
overlap_list=(50)
-
window size for peak pair selection in nucleotides
window_list=(500)
The scripts are written to be run on an HPC cluster under slurm
sbatch findingORIs/qc_mapping.sh
sbatch findingORIs/finding_ORIS.sh