question about OMArk rules and omamer result #35

CAShuangchao · 2024-10-19T17:46:03Z

Hello, I would like to ask OMArk rules for determining gene conservation or consistency--what specific rules are used to determine these data in omamer results--have any detailed instructions? thanks.

YanNevers · 2024-10-28T15:31:45Z

Hello!

OMArk will automatically process the OMAmer results, while recovering some additional data from the OMAmer database. All details are described in OMArk's paper at https://www.nature.com/articles/s41587-024-02147-w .

However, here is a quick overview:

For completeness. OMark query the OMAmer database to extract all orthologous groups (HOGs) at the taxon of interest and select the ones present in more than 80% of species - conserved HOGs. If thoses are HOGs are also found in the OMAmer input; it reports it as present (as single or duplicate depending on the number of occurences in the file )

For the consistency part, it is slightly more complicated. Again, OMArk query the OMAmer database to obtain the list of all HOGs known to exist in the clade of interest. If the HOGs found by OMAmer for any protein it will be classified as phylogenetically Consistent (blue); the one with no match in the OMAmer file are classified as Unknown.
For the ones that are left, they are classified as Contamination if the placement corresponds to a contamination (that OMArk assess earlier in the process - see the paper) or Inconsistent otherwise.

For the structural consistency (fragment/partial mapping): OMArk uses the data provided in the OMAmer output directly.
A sequence is classified as fragments if the query length (qseqlen in omamer) is less than half as long as the median protein length in the HOG it was placed into (subfamilymedianseqlen in omamer)
A sequence is classified as Partial mapping if the kmer matches are detected over only part of the sequence. In the OMAmer output, the qseq _overlap parameter corresponds to the proportion of the sequence that is comprised between the first kmer in common with the HOG of interest, and the last. If this value is under 0.8 OMArk wil report the sequence as partial mapping.

I hope this answers your question.

Best wishes,
Yannis

CAShuangchao · 2024-10-28T15:58:14Z

Thank you for your answer. OMArk is a great tool that has given me some inspiration in dealing with the logic of HOG attribution for target genes and dealing with some of the problems I encountered in my project.

Thanks,
huangchao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about OMArk rules and omamer result #35

question about OMArk rules and omamer result #35

CAShuangchao commented Oct 19, 2024

YanNevers commented Oct 28, 2024

CAShuangchao commented Oct 28, 2024

question about OMArk rules and omamer result #35

question about OMArk rules and omamer result #35

Comments

CAShuangchao commented Oct 19, 2024

YanNevers commented Oct 28, 2024

CAShuangchao commented Oct 28, 2024