Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding RERconverge Analysis and Gene Tree Estimation #105

Open
aaannaw opened this issue Dec 10, 2024 · 0 comments
Open

Questions Regarding RERconverge Analysis and Gene Tree Estimation #105

aaannaw opened this issue Dec 10, 2024 · 0 comments

Comments

@aaannaw
Copy link

aaannaw commented Dec 10, 2024

Dear professor,
I am new to RERconverge and currently running analyses to estimate the relative evolutionary rate (RER) for each gene on every branch of the tree. I have encountered a few questions during my analysis and would greatly appreciate your insights.

(1) Understanding Positive and Negative Values in RER Output:

The final RER output for each gene contains many NAs, which I have excluded. However, the remaining values include both positive and negative numbers. How should I interpret these? Are the positive and negative values meaningful, or could they be considered as artifacts? I was thinking that perhaps the magnitude of the values (regardless of sign) indicates the rate, with larger values suggesting faster relative evolutionary rates. Could you confirm if this is correct?
image

(2) Warning Message During RER Calculation:

When I running the command BathyergidaeRER = getAllResiduals(toyTrees, useSpecies = useSpecies,transform = "sqrt", weighted = T, scale = T),the stdout includes this warning:

cutoff is set to 2e-08
i= 1    Naming rows and columns of RER matrix
Warning message:
In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :  collapsing to unique 'x' values

According to the FullWorkThrough readme, this means that branches shorter than the cutoff will be excluded from the analysis. However, the final RER output includes all genes, including those from branches that should have been excluded. Could you help me understand why this might be happening?
image
image

(3) Master Tree and CDS Alignments:

When I run the following command to estimate branch lengths for all genes: estimatePhangornTreeAll(alndir=alignmentfn, treefile=mastertreefn, output.file=outputfn), I use a tree without branch lengths as the master tree: (Cgu,(((Tsw,Pty),Hcr),(Hgl,((Gca,Bsu),(Cho,(Fme,(Fda,(Fdm,(Fan,Fmi)))))))));. Also, I used alignments for coding sequences (CDS) rather than amino acid sequences.
Could you advise on whether this is an appropriate approach?

Below is the complete R code I have been using:

alignmentfn=("2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/02.genes")
outputfn=("2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/15.Rerconverge/01.output.gene_trees.txt")
mastertreefn=("2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/15.Rerconverge/02.master.tree")
estimatePhangornTreeAll(alndir=alignmentfn, treefile=mastertreefn, output.file=outputfn)
Bathyergidaetreefile = "01.output.gene_trees.txt"
BathyergidaeTrees=readTrees(Bathyergidaetreefile)
useSpecies = c("Pty", "Hcr", "Cgu","Tsw","Hgl","Bsu","Gca","Cho","Fan","Fda","Fme","Fmi","Fdm")
BathyergidaeRER = getAllResiduals(BathyergidaeTrees, useSpecies = useSpecies,transform = "sqrt", weighted = T, scale = T)
gene_ids <- rownames(BathyergidaeRER)
BathyergidaeRER_with_gene_ids <- cbind(Gene_ID = gene_ids, BathyergidaeRER)
output_file <- "BathyergidaeRER_output_with_gene_ids.tsv"
write.table(BathyergidaeRER_with_gene_ids, file = output_file, sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE)

Thank you in advance for your help! I look forward to your feedback.

Best regards,
Na Wan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant