Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diploid sampling #4059

Merged
merged 5 commits into from
Aug 25, 2023
Merged

Diploid sampling #4059

merged 5 commits into from
Aug 25, 2023

Conversation

jltsiren
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • Diploid mode for haplotype sampling: first select N haplotypes, then choose the best pair.

Description

A quick implementation of diploid sampling in vg haplotypes. This mode, enabled with option --diploid-sampling, first generates the specified number of candidate haplotypes with the existing greedy algorithm. Then it considers all pairs of candidates and chooses the highest-scoring pair according to a simple +1/-1 scoring scheme:

  • Present kmer: -1 / 0 / +1 for 0 / 1 / 2 copies in the haplotypes
  • Heterozygous kmer: 0 / +1 / 0
  • Absent kmer: +1 / 0 / -1

As usual, this process is repeated independently in each subchain.

Haplotype sampling now also tries to use reference contig names in the generated haplotypes. This requires rebuilding the haplotype information.

@jltsiren
Copy link
Contributor Author

I added a check that there is only one top-level chain in each graph component (see #4060). Right now, vg haplotypes refuses to generate the haplotype information if the check fails. I'm not sure if it's better to tell the user to simplify the graph (by removing tips?) or remove such components manually, maybe depending on the size of the component, or if we should skip such components.

@jltsiren jltsiren merged commit e173d0a into master Aug 25, 2023
1 check passed
@jltsiren jltsiren deleted the diploid-sampling branch August 25, 2023 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants