This repository contains scripts and files used to analyze stranded paired-end SNS-Seq experiments for Stanojcic et al in prep.
The scripts are made to run on a HPC under slurm (Roscoff Bioinformatics platform ABiMS (
The first script is for read quality control and mapping if you have already mapped reads you can directly start with the second script, that will used aligned reads from stranded SNS-seq as input.
The first script uses raw paired-end stranded sequencing reads from SNS-seq experiments and performs quality check, adapter and quality trimming and read mapping.
- quality control of the provided fastq files (fastqc 0.11.9)
- trimming of tail and adapter sequences (cutadapt 4.0)
- quality trimming with q20 threshold (trimmomatic 0.39)
- alignment against provided indexed genome (bowtie2 2.4.1)
- sorting and conversion fom sam to bam of aligned reads (samtools 1.13)
- check for insert size (picard 2.23.5)
- removal of duplicated reads (picard 2.23.5)
- bam file indexing (samtools 1.13)
- generation of Multiqc report (multiqc 1.13)
The second script will use paired end aligend reads (sorted bam) as input and outputs bed files of the detected ORIs
- seperation of mapped reads in minus and plus strand (samtools 1.13)
- strand-seperated peak calling (macs2
- peak filtering (bedtools and awk) based on three criteria:
- distance of minus and plus strand peaks
- no complete overlap between plus and minus strand peaks
- right strand orientation (minus followed by plus)
Parameters for the script need to be provided in the config_mapping.txt file, that has to be in the same working directory as the script :
output directory name. The directory will be generated by the sript in the working directory.
list of sample names as found in the fastq files e.g. $sample\L001_R1_001.fastq.gz
input_list=("x_S1_" "y_S2_" "z_S5_")
path to fastq directory
path and genome prefix of bowtie2 index and genome.fasta
genome prefix for filenames of mapped reads
Parameters for the script need to be provided in the config_finding-ORIs.txt file, that has to be in the same working directory as the script:
path to aligned reads
output directory name (generated by the sript in the working directory)
prefix of mapped reads ($sample$prefix.bam)
sample name as found in bam files
peak overlap to be excluded in percent
window size for peak pair selection in nucleotides
The scripts are written to be run on an HPC cluster under slurm
sbatch findingORIs/
sbatch findingORIs/