Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-processing the final ARG #463

Open
hyanwong opened this issue Jan 16, 2025 · 8 comments
Open

Post-processing the final ARG #463

hyanwong opened this issue Jan 16, 2025 · 8 comments

Comments

@hyanwong
Copy link
Contributor

We might want to

(a) put all mutations immediately below the only child of a recombination node on the two parent branches above the node

then (possibly)

(b) run tsdate to give the internal nodes a sensible time.

At the moment we are likely to do this using code in the sc2ts-paper repo, but there's an argument that it should be a core part of sc2ts instead (at least (a) anyway).

@hyanwong
Copy link
Contributor Author

Here's an example (BA.2) of why we want the mutations above the recombination nodes. The huge collection of mutations on the almost-horizontal branch almost certainly belong on the empty vertical branch descending from #72702

Image

@jeromekelleher
Copy link
Owner

I think similar things happen with Delta, I agree this is an issue with the fine details of these recombinants.

Mind you it'll probably be messy, if some of these mutations get moved around subsequently by parsimony operations.

@hyanwong
Copy link
Contributor Author

I think we simply run the routine at the very end, like we do for dating? I can see why we would want to keep them below the node, for parsimony reasons, until then.

@hyanwong
Copy link
Contributor Author

By the way, I can believe that this causes weird constraints when running tsdate on the sc2ts ARGs

@jeromekelleher
Copy link
Owner

I think we simply run the routine at the very end, like we do for dating? I can see why we would want to keep them below the node, for parsimony reasons, until then.

No, as in, some of the mutations that should be directly "below" a recombinant have been moved around by parsimony operations after it was added. Maybe if they have then then it's logically OK anyway. Hopefully it's nice and simple as you say, I'm just used to things being more complicated that you might hope in this thing.

@hyanwong
Copy link
Contributor Author

Yep, I think if they have been moved, that's a sign that they really are lower down.

For the moment, it's probably easiest to add the "move mutations above a recombination node" into the first part of the dating script (currently in https://github.com/jeromekelleher/sc2ts-paper/blob/main/scripts/run_nonsample_dating.py), as it should probably be a prerequisite for dating anyway.

@jeromekelleher
Copy link
Owner

Other operations we want to do:

  • Add the exact matches
  • Strip node metadata back a bit
  • Move the mutations for recombinants around
  • Strip out some of the immediate reversions marked by IMR nodes (IIRC these should be removable)
  • Add in the most dominant deletions

@jeromekelleher
Copy link
Owner

See #475 re point above on IMR nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants