Skip to content

bridlin/finding_ORIs

Repository files navigation

finding ORIs in stranded SNS-seq

This repository contains scripts and files used to analyze stranded paired-end SNS-Seq experiments for Stanojcic et al in prep.

The scripts are made to run on a HPC under slurm (Roscoff Bioinformatics platform ABiMS (http://abims.sb-roscoff.fr)).

The first script qc_mapping.sh is for read quality control and mapping if you have already mapped reads you can directly start with the second script finding_ORIs.sh, that will used aligned reads from stranded SNS-seq as input.

The first script qc_mqpping.sh uses raw paired-end stranded sequencing reads from SNS-seq experiments and performs quality check, adapter and quality trimming and read mapping.

  • quality control of the provided fastq files (fastqc 0.11.9)
  • trimming of tail and adapter sequences (cutadapt 4.0)
  • quality trimming with q20 threshold (trimmomatic 0.39)
  • alignment against provided indexed genome (bowtie2 2.4.1)
  • sorting and conversion fom sam to bam of aligned reads (samtools 1.13)
  • check for insert size (picard 2.23.5)
  • removal of duplicated reads (picard 2.23.5)
  • bam file indexing (samtools 1.13)
  • generation of Multiqc report (multiqc 1.13)

The second script finding_ORIs.sh will use paired end aligend reads (sorted bam) as input and outputs bed files of the detected ORIs

  • seperation of mapped reads in minus and plus strand (samtools 1.13)
  • strand-seperated peak calling (macs2 2.2.7.1)
  • peak filtering (bedtools 2.2.0.0 and awk) based on three criteria:
    • distance of minus and plus strand peaks
    • no complete overlap between plus and minus strand peaks
    • right strand orientation (minus followed by plus)

Parameters

Parameters for the qc_mqpping.sh script need to be provided in the config_mapping.txt file, that has to be in the same working directory as the script :

  • output directory name. The directory will be generated by the sript in the working directory.

    output_dir=alignedReads

  • list of sample names as found in the fastq files e.g. $sample\L001_R1_001.fastq.gz

    input_list=("x_S1_" "y_S2_" "z_S5_")

  • path to fastq directory

    fastq_directory=path/to/fastqfiles

  • path and genome prefix of bowtie2 index and genome.fasta

    genome=path/to/genome/myfavorite_Genome

  • genome prefix for filenames of mapped reads

    genome_prefix=myfavorite_Genome

Parameters for the finding_ORIs.sh script need to be provided in the config_finding-ORIs.txt file, that has to be in the same working directory as the script:

  • path to aligned reads

    read_directory=path/to/aligned_reads

  • output directory name (generated by the sript in the working directory)

    output_dir=results/myresults

  • prefix of mapped reads ($sample$prefix.bam)

    file_prefix=aln-pe_genomeprefix_sorted_reheadered_dups-removed

  • sample name as found in bam files

    input_list=("x_S1_" "y_S2_" "z_S5_")

  • peak overlap to be excluded in percent

    overlap_list=(50)

  • window size for peak pair selection in nucleotides

    window_list=(500)

Run

The scripts are written to be run on an HPC cluster under slurm

  sbatch findingORIs/qc_mapping.sh
  sbatch findingORIs/finding_ORIS.sh

Authors

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages