Skip to content

MEME output #1828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elodieaudemar opened this issue Mar 24, 2025 · 6 comments
Open

MEME output #1828

elodieaudemar opened this issue Mar 24, 2025 · 6 comments

Comments

@elodieaudemar
Copy link

Dear Sergei,

I'm working on MEME method and some questions couldn't find answer.
My version for HyPhy is 2.5.64.

-On one of your exchanges, you said the "# branche" is the number of branche > 100 (EBF) however I find nowhere the EBF (either ouput and json), I have only the LRT.
-The omega on the "### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model section" is the global omega for all the branches how were detected on the "# branche" ?
-I have a surprinsing hight omega+ (1677), Do I need to interroge myself when the omega+ is high like >100 or >1000? I would like to say yes but I would prefre a confirmation.
-I gave a unlabelled tree, how can I know what are the node it has created?
-For all my sequences, I have aroud 1 to 12-15 % of N and gaps but for some others I can have higer pourcentage (30 to 60% or more). How much of pourcentage you can't guarantee the results?
-Is MEME sensitive if our sequences diverge only slightly?

Thank you a lot for your help.

Best,
Elodie

@spond
Copy link
Member

spond commented Mar 24, 2025

Dear @elodieaudemar,

On one of your exchanges, you said the "# branche" is the number of branche > 100 (EBF) however I find nowhere the EBF (either ouput and json), I have only the LRT.

EBF is stored in the JSON, and can be viewed using https://observablehq.com/@spond/meme

Image

If you are interested in accessing EBF for Branch/Site pairs, it can be done via Pythong or anything else that reads JSON files.

The omega on the "### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model section" is the global omega for all the branches how were detected on the "# branche" ?

This is the global ω for all the tested branches (if you supplied --branches), otherwise it's for all branches.

-I have a surprinsing hight omega+ (1677), Do I need to interroge myself when the omega+ is high like >100 or >1000? I would like to say yes but I would prefre a confirmation.

No; this is effectively an ∞ You should not trust point estimates of site-level ω they are likely to be very noisy. The reportable outcome is the p-value for the LRT of positive selection.

I gave a unlabelled tree, how can I know what are the node it has created?

You mean NodeXXX? You can see what they are in the tree viewer (check in show internal box)

Image

and also in the MEME json output as a part of the Newick tree string

Image

For all my sequences, I have aroud 1 to 12-15 % of N and gaps but for some others I can have higer pourcentage (30 to 60% or more). How much of pourcentage you can't guarantee the results?

As the famous saying goes, "but in this world nothing can be said to be certain, except death and taxes" (https://en.wikipedia.org/wiki/Death_and_taxes_(idiom))

MEME should fail safe, i.e. if you have no data, the power to detect anything will decay. One area of concern, is that alignments with many N and - can be of poor quality.

Is MEME sensitive if our sequences diverge only slightly?

Hard to say. Generally, you have lower power for low diveregence sequences. You can try --resample 100 option to engage the option for parametric bootstrap (higher sensitivity); it will be much slower. How many sequences do you have?

Best,
Sergei

@elodieaudemar
Copy link
Author

Dear @spond,

Thank you so much for your clear answers and rapidity, and I didn'y know about this famous saying haha.
I'm working on 3200 genes and each gene has the 8 sames species (but one has bad quality), only the CDS change so I'm in a cluster to make loops.
And if I understood, "#branche" is not an importante variable. If at my codon 12 it says 2, that just mean that I have 2 branches under selection for this codon. And we can know which ones are, with .json?

Best,
Elodie

@spond
Copy link
Member

spond commented Mar 27, 2025

Dear @elodieaudemar,

For 8 species, I don't expect MEME to have great power to detect anything (you typically need 20+ sequences AND decent divergence). Typically you would either use BUSTED to detect genes (as units) under selection, or aBSREL to look for branches under selection.

Best,
Sergei

@elodieaudemar
Copy link
Author

Dear @spond,

Thank you a lot for your help, because yes MEME detected weird codons (now for sure they were faulse positif).
I use aBSREL as well (I'm writting questions about this method as well) but I wanted about the codon and not the gene so I took FUBAR after red that it is great for little and a lot of sequences and it is well better after running the same genes than MEME.

Thank you again for your help.

Best,
Elodie AUDEMAR

@spond
Copy link
Member

spond commented Apr 4, 2025

Dear @elodieaudemar,

aBSREL outputs some indications of where the selection signal comes from. For example

hyphy absrel tests/hbltests/libv3/data/CD2.nex

Then load the .json into the visualization module (you can also get this information from the JSON programmatically), https://observablehq.com/@spond/absrel

Here's a heat map of individual codons contributing selection signal

Image

And here's a sorted list of specific codons with high empirical bayes factors on one of the branches found to be under selection by aBSREL, CAT.

Image

Best,
Sergei

@elodieaudemar
Copy link
Author

Hi @spond,

Thank you a lot for your help.

Best,
Elodie AUDEMAR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants