Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mamRERw result is NA #91

Open
cjl200095 opened this issue Apr 5, 2024 · 6 comments
Open

mamRERw result is NA #91

cjl200095 opened this issue Apr 5, 2024 · 6 comments

Comments

@cjl200095
Copy link

Dear RERcoverge designer,
I'm running into an error when I try to run RERconverge with my data.
When I run the getAllResiduals with the built tree, the results are all NA.
image
I don‘t know how to fix this.
Looking forward to your positive response.

@nclark-lab
Copy link
Owner

Could you provide your input trees file so we could diagnose the problem?

@cjl200095
Copy link
Author

cjl200095 commented Apr 6, 2024 via email

@LisaSkalon
Copy link

Hi @nclark-lab,

It appears that I'm encountering a similar issue to cjl200095. My dataset (phangorn2.txt) consists of 1069 genes and 30 species. While attempting to compute RER for all 30 species using getAllResiduals, only 169 genes have RER estimated (i=169, cutoff=3e-08). However, when I attempt to reduce the number of species for binary trait analysis (for example, using useSpecies=c("tuvinicus", "semicanus", "lemminus", "macrotis", "centralis", "glareolus", "rutilus"), the output table contains only NAs, and the counter i=.. is not printed. Could this be due to the branch lengths in my dataset being too small? Is there a way to resolve this and obtain a RER matrix for the reduced dataset?

Thank you in advance!

Lisa

phangorn2.txt

@sorrywm
Copy link
Collaborator

sorrywm commented Apr 26, 2024

Hi Lisa!

I did not have any difficulties estimating RER from your full trees with the default settings. I suspect your output contains RERs for all your genes as well. Here is what I ran:
ttrees <- readTrees(phangorn2.txt)
trer <- getAllResiduals(ttrees)
I then checked how many non-NA RER values there were in each row (for each gene) of the RER matrix:
hist(rowSums(!is.na(trer)))
egHistGithubIssue
Clearly there are many genes with RERs calculated for many of the branches in the tree.*

At what stage (in which function) are you reducing your species set using useSpecies? I would not recommend reducing your species set this drastically, unless you only have phenotype data for a very small number of species. If you do need to restrict your species set, you will need to change the min.sp option in getAllCor from its default of 10. If you don't change this option, you will get mostly NAs because very few of your gene trees will have at least 10 branches.

Thank you for using RERconverge!

--Wynn

*To explain, in the output of getAllResiduals, the final i does not represent the number of genes for which RER are estimated. The program only prints an i for each separate set of species represented in the gene trees; all the gene trees with that species set are used to calculate RER for all of those genes in just one step, so the program does not print an i for each gene. The functions to correlate RER with phenotypes (e.g., correlateWithBinaryPhenotype) do not print any i, regardless of the number of genes.

@LisaSkalon
Copy link

Hi Wynn @sorrywm,
thank you very much for your response!

I did not have any difficulties estimating RER from your full trees with the default settings. I suspect your output contains RERs for all your genes as well.
To explain, in the output of getAllResiduals, the final i does not represent the number of genes for which RER are estimated.

Thank you for explaining that. Initially, I was confused by the behavior of the counter i in my data, since in the example dataset it ranged from 1 to 200 (max). Now I see that all my genes have their RER computed in the non-reduced dataset.

At what stage (in which function) are you reducing your species set using useSpecies? I would not recommend reducing your species set this drastically, unless you only have phenotype data for a very small number of species. If you do need to restrict your species set, you will need to change the min.sp option in getAllCor from its default of 10.

I was trying to reduce my species set in getAllResiduals since I wanted to compare only these 7 species in further binary trait analysis. They are phylogenetically close but have drastically different lifestyles, and I am interested in genes underlying these two phenotypes. Thanks to your answer, I was able to perform the analysis on this reduced set (I changed min.op in both getAllResiduals and correlateWithBinaryPhenotype). However, you mentioned that you don't recommend doing so. Is that because statistical analysis has reduced power on datasets with less than 10 species? Do the results of this analysis still make sense, or is it better to consider a more broadly distributed trait?

I really appreciate your help. Many thanks to the team for the tool and beautiful easy-to-follow tutorials!

Lisa

@Wu-tz
Copy link

Wu-tz commented Jul 4, 2024

Dear RERcoverge designer,

I'm encountering a similar issue when I used a dataset with nine species to run getAllResiduals. Is it more likely to cause NA when fewer species are used?

Here is my dataset
all.tree.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants