Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastwilcoxGMTall with Goterms instead of pathways #99

Open
MaximePolicarpo opened this issue Sep 2, 2024 · 2 comments
Open

fastwilcoxGMTall with Goterms instead of pathways #99

MaximePolicarpo opened this issue Sep 2, 2024 · 2 comments

Comments

@MaximePolicarpo
Copy link

MaximePolicarpo commented Sep 2, 2024

Dear All,

I had two questions concerning enrichment analysis and permulations.

  1. With the function "fastwilcoxGMTall", is it correct to use a list of goterms and associated genes instead of pathways ?

  2. I performed 1,000 permulations for a binary trait analysis using "getPermsBinary" and then computed the p-values using permpvalcor. This result in a data.frame with two columns: "permpval", and "permstats". However, there is no Rho values, which are only in the getPermsBinary results. I guess that such Rho values don't have much sense as they represent the Rho values between the gene rates are the permutated phenotype on the tree. If I want to report the results of RERconverge, is it correct to report the initial Rho value computed with "correlateWithBinaryPhenotype", but with the p-values and stats values corresponding to the permulations results ? Also, in order to perform an enrichment analysis after those permulations, is it correct to run "fastwilcoxGMTall" with the "permstats" values as input ? (The same question hold true for permulations with a continuous trait)

Thanks a lot for any help and guidance ! :D

All the best,

Maxime

@nclark-lab
Copy link
Owner

Hello Maxime,

  1. It is perfectly fine to use a list of GO terms or any gene annotations the user wants to supply.
  2. There is some disagreement in the group about ranking by perm p-values versus the initial Rho values. We will consult and get to you with best practices.

@MaximePolicarpo
Copy link
Author

Dear Nathan,

Thanks a lot for your answer.

For the moment, here is what I do:

I compute the perm_stats and perm_pvalues based on 1,000 permulations. Then, I build a named vector with the perm_stats values (and gene names as names) and use this named vector in the fastwilcoxGMTall function.

To note, on the side, I also did pGLS to correlate the copy number in several gene families and my phenotype of interest. I then converted the R-squared values from pGLS to R values (sqrt(R2) * sign(b1))
Using the functions you provide in the "Permulation Walkthrough", I could also perform 1,000 permulations of my phenotype and recomputed these pGLS and those R values. I then computed the permulations pvalues and permulations statistics (just by mimicking the RERconverge::permpvalcor function). I am now using the computed permulations statistics to perform the enrichment analysis (Wilcoxon Rank-Sum enrichment) but I will wait for your final answer to be sure of what I am doing :). Knowing that I also don't know if you assessed the differences in the enrichment analysis if you perform such a Wilcoxon Rank-Sum enrichment or a more simple over-representation analysis with the "significant" genes as input and all the tested genes as background ?

Thanks a lot again for your precious help,

All the best,

Maxime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants