Skip to content

ABSREL p-value distribution across many genes #1834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fuesseler opened this issue Apr 11, 2025 · 2 comments
Open

ABSREL p-value distribution across many genes #1834

fuesseler opened this issue Apr 11, 2025 · 2 comments

Comments

@fuesseler
Copy link

Dear authors,
Thank you for developing and maintaining this software!

I have been running ABSREL for many genes (around 10k) in a 6-species phylogeny and struggling with after applying multiple testing correction (BH) "removing" any "signal" of positive selection. I have been trying with some subsets of smaller more targeted gene sets, but mostly run into this issue again.
When checking the p-values output by ABSREL I noticed a weird distribution that I am not sure how to interpret (whether this is the expected behavior or not). So I wanted to ask, is it abnormal to have many p-values of "1" ? I also observe that there never are any p-values inferred between 0.5 and 1 (except for the value of 1). I atttached a plot of the distribution.

I would be very grateful for some advice on how to interpret this (if this is an indication for problems when running ABSREL on my data or not).
Best regards,
F

Image

@spond
Copy link
Member

spond commented Apr 11, 2025

Dear @fuesseler,

I am afraid that with a 6-sequence dataset nothing will survive a 10K BH FDR correction. You may have to be a little bit more permissive with what you consider significant. Keep in mind that within a gene, aBSREL does a conservative multiple testing correction over branches already.

The p-value distribution that has a lot of values at 1 is expected. If the inference procedure for a given branch fits an unconstrained model where all ω values are < 1, then constraining them to < 1 will yield exactly the same model, with the same log-likelihood an a p-value of exactly 1. In fact aBSREL doesn't even fit these models, it reports them as "test not run".

The boundary at 0.5 is also expected, because if the test IS run, then the constraint on ω results in a test statistic which does not generate p-values >0.5 (it's a mixture of a point mass and a χ squared).

So the graph you pasted looks pretty much exactly as what I would expect.

Best,
Sergei

@fuesseler
Copy link
Author

Dear Sergei,
Thanks for that explanation! That all makes sense now. So setting the FDR cutoff to be more permissive ( for example 0.1 instead of 0.05 might help), if I understand correctly?
I had not realized that the size of the dataset has such a huge impact on the results of ABSREL. I think I will try to extend my dataset with more distant outgroups. I know this is hard to predict/judge but from your experience would you have a recommendation on a minimum dataset size, when running ABSREL on thousands of orthologs?

Thank you for your help,
F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants