Skip to content



Folders and files

Last commit message
Last commit date

Latest commit



47 Commits

Repository files navigation


Going from raw ChIP-seq reads to super-enhancer regions and comparing these regions within two groups can be a very tedious job. This pipeline was created to do this in a relative easy to use way. The pipeline features 8 clear steps (and 2 optinonal steps) that go from raw reads to super-enhancer regions and their associated genes.


This pipeline makes use of different libraries within python and Linux. Needed to run the full pipeline is: python related:

- Python (>= 2.6 and < 3)
- argparse module
- os module
- sys module
- subprocess module
- time module

Linux related:

- Macs2
- samtools
- bedtools

and the most recent version of R

Usage [-h] {callpeaks,filterChromosomes,filterPeaks,sortGFF,ROSE,peakConsensus,compareSamples,DESeq2,filterResults,findGenes}

The pipeline has 8 command to find super-enhancers.

Sub command Discription
callPeaks Calling peaks by making use of the macs peakcaller
filterChromosomes Optional: filter certain chromosomes out I.E chromosome X and Y
filterPeaks stretch peaks, remove peaks to close to TSS, remove MT dna and convert to gff
sortGFF Optional: Sort the created gff file on chromosome and starting coordinates
ROSE Finding the super-enhancer regions by making use of ROSE
peakConnsensus Creating one consensus file containing only regions that are found in a certain amount of the samples
compareSamples Find for each super-enhancer how many reads are found in each sample
DESeq2 Use DESeq2 to find the log2foldchange per group
filterResults Optional: Filter out results that aren't significant or have a to low difference in either group
findGenes Finding which genes are associated with the significant regions

Required Steps


The first step makes use of the mac peak calling algorithm. This is possible to do with or without input/control files. All other settings are the macs2 default setting.

required positional flags


A list of files peaks are called for. on every row should be the relative path from the pipeline to the file. If peaks should be called with a control file, the row should contain the relative path to the sample and the relative path to the control file seperated by comma.

optional flags


The name of the output directory. Default = Peakcalling_Output

Example input: python callpeaks peakCall_filelist.txt -o Macs_peakcalling


Convert the narrowPeak file from the peakcalling into a gff file (needed for ROSE). optional: stretch peaks that are to small, filter out peaks that are to close to a TSS and remove peaks that are on mitochondrial DNA.

required positional flags


A list of files that are converted. Every row should contain the relative path from the pipeline to the narrowPeak file


The relative path to the refSEQ txt file that contains all the regulatory elements

optional flags


The name of the output directory. Default = filteredPeaks


If peaks are smaller than the given size, they will be stretched. default = 2000 bp


If peaks are closer than the given distance to a TSS, they will be filtered out. default = 5000


Remove all peaks that are from the mitochondria (True/False). default = True

Example input: python filterPeaks filterPeaks_filelist.txt ncbiREFSEQ.txt -o SEpeaks_filtered -p 5000 -tr 10000 -mt False


Find super enhancers by calling the ROSE_main() program

required positional flags


A list of files that are used for ROSE. Each line should contain the relative path to each gff file and the bam file it is associated with, comma seperated.

optional flags


Which genome build is used to map against (MM8, MM9, MM10, HG18, HG19). default = HG19

Example input: python ROSE ROSE_filelist.txt -g HG18


Creating one consensus file containing only regions that are found in a certain amount of the samples

required positional flags


A list of the relative paths to every _SuperEnhancers.table.txt file created by ROSE.


The name of the bed file created (_consensus will be added. I.E. NAME_consensus.bed


how many files should contain a certain peak to be considered true

Example input: python peakConsensus peakConsensus_filelist.txt Analysis_1 3


Find per super-enhancer how many reads are from every seperate sample

required positional flags


A list of files. Every row should contain the original bam file, the consensus file and in which group the sample belongs comma seperated (I.E. PBT1.bam,PBT_SFT_consensus.bed,A_PBT)


The relative location of the consensus file

Example input: python compareSamples compareSamples.txt consensusFiles/Analysis_1_consensus.bed


Compare two sample groups and find if certain super-enhancers appear to have a higher activity in either one of the groups There are no flags but in the same folder as the pipeline and the DESeq2 R script should be three files. superEnhancer_counts.txt, superEnhancer_names.txt & results.txt. These files are generated in the comparareSamples step.

Example input: python DESeq2


find genes that are associated with each super-enhancer

Required positional flags


The relative location of the results peak file, generated by DESeq2 and possibly filtered by the filterResults step.


The relative location of the refrence genome file

Optional flags


The maximal distrance between a TSS and the super-enhancer to be associated with each other. default = 50.000 bp


A translation file in TSV format with the columns: RefSeq mRNA ID and Gene name, tab delimited.

Example input python findGenes significant_adjFoldChange_results.txt ncbiREFSEQ.txt -t 100000 -n geneTranslateFile.txt

Optional steps


Filter certain chromosomes out of your narrowPeak file

Required positional flags


The relative path to the directory with narrowPeak files


The chromosomes you want to keep comma seperated

Example input: python filterChromosomes Macs_peakcalling


Sort the .gff file on chromosome and location

Required positional flags


A file list with on each row the relative path to the GFF files


The name of output directory

Example input: python sortGFF sort_filelist.txt sorted_GFFs


filter our DESeq2 results that aren't significant or below a certain log2foldchange


Filter out peaks that have a higher P-value than the given value


Filter out the result peaks that have a higher adjusted P-value than the given value. If this flag is used, the -s flag will be ignored


Filter out the peaks that have a foldchange lower than the given value

Example input: python filterResults -a 0.1 -f 0.4


A pipeline to locate super-enhancer regions






No releases published


No packages published


  • Python 93.8%
  • R 6.2%