Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction with IDR (or other IDR-like) framework? #4

Open
michaelbale opened this issue Oct 3, 2022 · 7 comments
Open

Interaction with IDR (or other IDR-like) framework? #4

michaelbale opened this issue Oct 3, 2022 · 7 comments

Comments

@michaelbale
Copy link

The output peak file is a 3-column BED file without a score or signal.value field which eschews any ability to use this peak caller in conjunction with something like IDR if we have replicates. Is there any plans or way to have an output that can be used with something like this?

@gartician
Copy link
Member

Hello @michaelbale,

Sorry for the delay to address this issue, we understand the importance of adding a fourth column for compatibility with IDR. However GoPeaks doesn't test the whole peak for significance, but rather the individual bins that make up a peak. One way to get one p-value per peak (similar to the output of macs2), is to somehow transform/combine a series of p-values in a peak into one value. We have yet to find an approach that does this, but are open to suggestions.

@michaelbale
Copy link
Author

Hi @gartician; thanks for the reply! Basically the method identifies the boundaries of "significant islands" through co-dependent bins? One could potentially use like a harmonic-mean p-value or Brown's extension of the Fisher method (given that the p-values are dependent). But I'm not so sure how to test the validity of the test - I've used the methods in other instances, but they were much more well-behaved.

@gartician
Copy link
Member

gartician commented Nov 22, 2022

Hi @michaelbale, GoPeaks indeed identifies boundaries of significant bins but I'm debating whether bins are co-dependent. To re-iterate, the HMP and Brown's extension of Fisher seem very interesting and they assume the p-values are dependent. My question is that (biologically speaking) the significance of bins certainly depends on the genome position (adjacent bins usually form into a peak) and the biological system, but the binomial distribution assumes independent tests among bins. I am not sure which interpretation is more correct, but I wonder if those perspectives violate or align with the assumptions of the HMP/Brown?

@michaelbale
Copy link
Author

I guess it would depend on the size of bin and how gopeaks interprets the reads - i.e. if a read spanning multiple bins is counted within each bin it spans or if just the read start/end is counted. I also would imagine that the likelihood is dependent given that clusters of bins are significant in a local neighborhood as that would be the definition of a 'peak' in this instance. I guess to be clear, the tests are independent, but the p-values are not.

@gartician
Copy link
Member

In GoPeaks if a read (technically a fragment, which is the content between R1 and R2) spans multiple bins then it is counted across the bins. The independent tests with dependent p-values is an interesting perspective and kinda makes sense. I can take a stab at implementing it in December and I will also gladly accept a PR but that's not required. Thank you for the constructive feedback!

@michaelbale
Copy link
Author

michaelbale commented Apr 15, 2023

Hi @gartician; I was wondering if you or anyone had had a chance to look into this recently! I didn't realize you mentioned accepting a PR; as much as I'd love to try - my familiarity with the go language is...not great.

@martinezvbs
Copy link

Hi,

I have been using GoPeaks for CUT&RUN/ATAC-seq data. I would like to know if since like time, there has been a change to continue this conversation (testing different methods). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants