-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interaction with IDR (or other IDR-like) framework? #4
Comments
Hello @michaelbale, Sorry for the delay to address this issue, we understand the importance of adding a fourth column for compatibility with IDR. However GoPeaks doesn't test the whole peak for significance, but rather the individual bins that make up a peak. One way to get one p-value per peak (similar to the output of macs2), is to somehow transform/combine a series of p-values in a peak into one value. We have yet to find an approach that does this, but are open to suggestions. |
Hi @gartician; thanks for the reply! Basically the method identifies the boundaries of "significant islands" through co-dependent bins? One could potentially use like a harmonic-mean p-value or Brown's extension of the Fisher method (given that the p-values are dependent). But I'm not so sure how to test the validity of the test - I've used the methods in other instances, but they were much more well-behaved. |
Hi @michaelbale, GoPeaks indeed identifies boundaries of significant bins but I'm debating whether bins are co-dependent. To re-iterate, the HMP and Brown's extension of Fisher seem very interesting and they assume the p-values are dependent. My question is that (biologically speaking) the significance of bins certainly depends on the genome position (adjacent bins usually form into a peak) and the biological system, but the binomial distribution assumes independent tests among bins. I am not sure which interpretation is more correct, but I wonder if those perspectives violate or align with the assumptions of the HMP/Brown? |
I guess it would depend on the size of bin and how gopeaks interprets the reads - i.e. if a read spanning multiple bins is counted within each bin it spans or if just the read start/end is counted. I also would imagine that the likelihood is dependent given that clusters of bins are significant in a local neighborhood as that would be the definition of a 'peak' in this instance. I guess to be clear, the tests are independent, but the p-values are not. |
In GoPeaks if a read (technically a fragment, which is the content between R1 and R2) spans multiple bins then it is counted across the bins. The independent tests with dependent p-values is an interesting perspective and kinda makes sense. I can take a stab at implementing it in December and I will also gladly accept a PR but that's not required. Thank you for the constructive feedback! |
Hi @gartician; I was wondering if you or anyone had had a chance to look into this recently! I didn't realize you mentioned accepting a PR; as much as I'd love to try - my familiarity with the go language is...not great. |
Hi, I have been using GoPeaks for CUT&RUN/ATAC-seq data. I would like to know if since like time, there has been a change to continue this conversation (testing different methods). Thanks! |
The output peak file is a 3-column BED file without a score or signal.value field which eschews any ability to use this peak caller in conjunction with something like IDR if we have replicates. Is there any plans or way to have an output that can be used with something like this?
The text was updated successfully, but these errors were encountered: