Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characterize variety of histories in the history DAG #7

Open
willdumm opened this issue Mar 11, 2022 · 2 comments
Open

Characterize variety of histories in the history DAG #7

willdumm opened this issue Mar 11, 2022 · 2 comments
Assignees

Comments

@willdumm
Copy link
Collaborator

willdumm commented Mar 11, 2022

Given a history DAG with many histories, how do we know how similar these histories are? Do they just vary by a few ancestral sequences but remain otherwise the same? Are histories found by the DAG fundamentally different than those used to seed it?

Ideas:

  • (implemented) comparing the number of input trees and expressed trees, we get an idea of the level of tree rearrangement that's happening in the DAG. Also internal_avg_parents.
  • (implemented, slow) count_topologies gives an idea of how many histories are the same with different ancestral sequences
  • How big are the clades which are being swapped?
  • Is there some natural way to cluster histories in the hDAG?
  • Is variation all in certain clades, with others really constant? A start at this could be counting how many histories each hDAG node takes part in.
@willdumm
Copy link
Collaborator Author

Node counting in #19

@willdumm
Copy link
Collaborator Author

Some ideas about how we can use the number of trees each node takes part in, where for purposes of this comment, a node's uncertainty is inversely proportional to the number of trees in the dag that it takes part in.

  • How does uncertainty vary across the DAG?
  • Is uncertainty concentrated above certain leaves? Are nodes above a subset of leaves very certain?
  • ^ maybe an equivalent question; where in the DAG are the uncertain nodes? Close to leaves, or higher up, near the UA node? (how big are their clade unions [union of their child clade sets]?)
  • Can we count the number of trees in the DAG that each edge takes part in (something like edge support / edge certainty)
  • ^ If we can, then we can evaluate how frequently a given parent/child label pair shows up in trees
  • ^^ We can use this to characterize the path uncertainty (in label space) to each of the leaf node labels in a useful way?

See #20 for path DAG methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants