Skip to content
This repository has been archived by the owner on Sep 22, 2019. It is now read-only.

Need a service that explains why a taxon isn't in the synthetic tree #98

Open
jar398 opened this issue Jun 12, 2014 · 17 comments
Open

Need a service that explains why a taxon isn't in the synthetic tree #98

jar398 opened this issue Jun 12, 2014 · 17 comments
Assignees
Labels

Comments

@jar398
Copy link
Member

jar398 commented Jun 12, 2014

See https://groups.google.com/forum/#!topic/opentreeoflife-software/raWZ7hSfUpI .
The argument to the service call would be an OTT id, e.g. as returned by TNRS. Result would be - I'm not sure, we need to decide this. If the taxon was determined paraphyletic, the result could be e.g. a list of studies that conflict with the monophyly of that node. If missing for some other reason, such as the taxon being incertae sedis, that could be encoded somehow.

This service would be used by the front end to prepare a report or display that general users can understand. See OpenTreeOfLife/opentree#310

@blackrim
Copy link
Member

sounds good. would be another analysis that we could run to populate those
nodes with the information that could then be served. will probably be a
little bit while we try to get new synth done

On Thu, Jun 12, 2014 at 1:33 PM, Jonathan A Rees [email protected]
wrote:

See
https://groups.google.com/forum/#!topic/opentreeoflife-software/raWZ7hSfUpI
.
The argument to the service call would be an OTT id, e.g. as returned by
TNRS. Result would be - I'm not sure, we need to decide this. If the taxon
was determined paraphyletic, the result could be e.g. a list of studies
that conflict with the monophyly of that node. If missing for some other
reason, such as the taxon being incertae sedis, that could be encoded
somehow.

This service would be used by the front end to prepare a report or display
that general users can understand. See OpenTreeOfLife/opentree#310
OpenTreeOfLife/opentree#310


Reply to this email directly or view it on GitHub
#98.

@jar398
Copy link
Member Author

jar398 commented Jul 27, 2014

This issue was raised again by Laura Katz on July 9 in private email:

go to http://dev.opentreeoflife.org/opentree/
Search for Drosophila in search box... find lonely lollipop with no information on source of conflict/exclusion (this is what we find for many microbial genera as well because many genera originally defined by morphology are NOT monophyletic in molecular analyses)

Basically she wants a more helpful treatment of nonmonophyletic taxa, and such a treatment can't happen with the help from treemachine requested by this issue.

@josephwb
Copy link
Member

I have a service that is about ready to go that report properties of a node, e.g. if it 1) is in the graph (i.e. not filtered out from treemachine), 2) if it is in the synthetic tree, 3) a bunch of other stats.
Seems like @jimallman has this well in hand using getMRCA, right?

@jimallman
Copy link
Member

I think we're doing a decent job now (on devtree) by listing a taxon's flags and lineage, but we don't really connect the dots and explain why certain flag(s) caused a taxon to be omitted.

@josephwb
Copy link
Member

Right. I am hazy on the exact meanings of the flags myself. Maybe @chinchliff and @jar398 could help with this?

@chinchliff
Copy link
Member

This is a bit of a complex issue. Currently I think we are avoiding it by
using the catch-all error messages that @jimallman and @kcranston recently
worked up in the tree browser.

The idea is that we would use information in the graph (e.g. flags, or
relationships, or both) to report to users why a given node wasn't in the
synthetic tree. I'm not sure it's very straightforward in all cases, but
some cases might be. Do we have a critical use case for this?

On Wed, Aug 27, 2014 at 3:06 PM, Joseph W. Brown [email protected]
wrote:

Right. I am hazy on the exact meanings of the flags myself. Maybe
@chinchliff https://github.com/chinchliff and @jar398
https://github.com/jar398 could help with this?


Reply to this email directly or view it on GitHub
#98 (comment)
.

@jimallman
Copy link
Member

@josephwb: i had added a table of friendly explanations for each flag, but this was removed in order to reduce clutter (at least in the current version):

OpenTreeOfLife/opentree@e0fa3f3#diff-1d1b3781103cc75f284db30d76bcd6e8R1426

I wrote these messages based on the comments in taxomachine's OTTFlag.java. Consequently, they might not be entirely accurate or properly phrased.

@jar398
Copy link
Member Author

jar398 commented Aug 27, 2014

I would think the compelling question is not why the node didn't go into
synthesis, which is easy to answer from the taxonomy flags, but rather why
a taxon in synthesis was judged to be paraphyletic. I imagine a user would
want to know which source tree(s) conflict with the monophyly of the given
taxon.

Jonathan

On Wed, Aug 27, 2014 at 2:44 PM, Joseph W. Brown [email protected]
wrote:

I have a service that is about ready to go that report properties of a
node, e.g. if it 1) is in the graph (i.e. not filtered out from
treemachine), 2) if it is in the synthetic tree, 3) a bunch of other stats.
Seems like @jimallman https://github.com/jimallman has this well in
hand using getMRCA, right?


Reply to this email directly or view it on GitHub
#98 (comment)
.

@josephwb
Copy link
Member

That seems tricky (but interesting): trees that do not go through the node. But, of course, only trees that could potentially go through the node.
So, what do we want? Any source trees that contain taxa descended (in the taxonomy) from the node of interest? That could be a start.
At the current level of tree sampling, most nodes do not have multiple source trees, say, trees that go through the node and trees that bypass the node.
Hmm...

@josephwb
Copy link
Member

Maybe, look for source trees that contain taxa descended (in the taxonomy) from the node of interest and connect to nodes that are deeper than the node of interest. Hmm.

The node_status service reports any source tree that passes through a node, so we can easily get supporting trees for the node, even if it does not appear in the synthetic tree. Getting the converse will need to be clever.

@kcranston
Copy link
Member

Returning the trees that support the node and the trees that do not support the node would be ideal. I could see that leading to more transparency and input about ranking source trees.

@chinchliff
Copy link
Member

I would think that any tree X with tips assigned to one or more nodes in at
least two descendant subtrees of some node Y would have the potential to
span Y. In the case of a bifurcating tree, Y has two children, and X must
contain at least one tip from each of them in order to be able to contain
Y. If Y has more than two children, then X must contain tips in mapped to
nodes in 2 or more of Y's descendant subtrees. Is there more to it than
that?

Ideally, we would want to calculate these compatibility mappings during
some procedure like import, rather than on the fly. It does seem like a
clever solution should be possible, starting at the tips/root of a newly
imported tree, and walking backward/forward through the graph...

On Wednesday, August 27, 2014, Karen Cranston [email protected]
wrote:

Returning the trees that support the node and the trees that do not
support the node would be ideal. I could see that leading to more
transparency and input about ranking source trees.


Reply to this email directly or view it on GitHub
#98 (comment)
.

@josephwb
Copy link
Member

Service graph/node_info gets part way there. It will report whether a node is in the synthetic tree, and if it has any supporting source trees. As mentioned above, we still need to determine if there are source trees that could potentially go through the node but do not. I have some ideas about how to accomplish this. Seems like a good hackathon exercise.

@jar398
Copy link
Member Author

jar398 commented Mar 25, 2016

I don't see that we can close it because we don't have such a service (to my knowledge) and haven't decided not to do it. I think there has been some discussion recently, I forget where. It will require propinquity support. Obviously not a v3 release thing. Assigning this to me so you ( @josephwb ) don't have to worry about it.

@jar398 jar398 reopened this Mar 25, 2016
@kcranston
Copy link
Member

Can resurrect this issue now that propinquity outputs a list of broken taxa in /labelled_supertree/broken_taxa.json? Not sure that treemachine is the right repo, though. Thoughts on how we should import & serve this information?

@jar398
Copy link
Member Author

jar398 commented Jul 26, 2016

I think that it would be too easy to break treemachine in adding this additional information, and I'm not comfortable enough neo4j to want to undertake it. Our choices then would be:

  1. smasher (conflict service)
  2. oti successor (ottreeindex)
  3. something new
    1 would not be difficult, but it has the disadvantage of increasing the amount of java code in the system. 3 would be tragic since we already have too many services/servers/code bases. So I vote for 2 - and since the same reasoning applies to many other features that we might like to add, this argues for expanding the scope of ottreeindex quite broadly.

This is just off the top of my head; perhaps I'm missing something obvious.

@jar398
Copy link
Member Author

jar398 commented Jul 26, 2016

The way this issue has evolved it doesn't belong in the repo any more (as @josephwb was indicating by closing it). We could move it to germinator or elsewhere, but it doesn't seem harmful to leave it here for now. We can move it when some kind of decision is reached.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants