MAnorm - identifying differential binding in chip-seq data using linear normalization on shared peaks
This repository contains a modified version of MAnorm that addresses some compatibility issues.
For an improved version of MAnorm that runs faster and allows for 2 v 2 comparisons (replicates) using edgeR, please visit the [following repository] (https://github.com/ying-w/chipseq-compare/tree/master/MAnorm)
Original can be found here Published in Genome Biology 2012
- In Manorm.sh, the line M←log2((common_peak_count_read1+1)/(common_peak_count_read2+1)) produces an error due to a change in bedtools overage behaviour. To fix this, I swapped the input order:
coverageBed -a read1.bed -b unique_peak1.bed → coverageBed -b read1.bed -a unique_peak1.bed
- Manorm.r requires the packages MASS, affy and R.basic, but the latter is deprecated and no longer available. Most of its functions have been transferred to R.utils and aroma.light, which can be installed as:
biocLite("aroma.light") ; install.packages(c("R.oo","R.utils","MASS"))
- The binomial coefficient function, nChooseK, was part of 'R.basic'. It was been replaced with the built-in function 'choose'.
Problems with MAnorm (from here)
- There is something wrong with how the p-values are calculated (see code in MAnorm2.R starting from line 50 for details).
- pval calculation is not optimized (very slow)
- It is faster to use
choose()
and run in parallel
- It is faster to use
- Stirling approximation seems to be done incorrectly
- This calculation is consistant in matlab version (more details in matlab MAnorm than R MAnorm)
- pval are not symmetric (calculations from x vs y do not give the same pvalues as y vs x)
- pval calculation is not optimized (very slow)
- mergeBed command in MAnorm.sh does not actually work (need to sort first)
- Lots of tmp files generated by MAnorm.sh and a lot of steps could be done in parallel
- Bedtools installed: http://bedtools.readthedocs.io/en/latest/content/installation.html
- Bioconductor packages installed: MASS, affy, R.utils
HOWTO: input the following lines to install the 3 previous packages
biocLite("affy")
install.packages(c("R.utils","MASS")
run command: ./MAnorm.sh sample1_peakfile[BED] sample2_peakfile[BED] sample1_readfile[BED] sample2_readfile[BED] sample1_readshift_lentgh[INT] sample2_readshift_length
MANorm requires two files: the peaks in BED format, easily retrieved from MACS, and the reads from the original SAM file in the format chromosome, start, end, strand (+/-). To obtain the latter, we can use the following:
samtools view BAM_FILE | awk -F'\t' '{if ($2==0) {print $3,$4,($4+length($10)-1),"+"} else if ($2==16) {print $3,$4,($4+length($10)-1),"-"}}