Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize het calls when merging clustered SVs #9058

Merged
merged 3 commits into from
Dec 3, 2024
Merged

Conversation

mwalker174
Copy link
Contributor

Makes some small adjustments to tiebreaker criteria for merging genotypes with the CanonicalSVCollapser, which is used in the SVCluster and GroupedSVCluster tools. We've empirically found, after considering ref vs non-ref and GQ, that prioritizing hets over hom-var genotypes improves variant quality in terms of hardy-weinberg equilibrium. This is also the preferred behavior for GATK-SV, as false hom-var genotypes (when truly a het) is considered one of the more egregious error modes, particularly in association studies.

A similar adjustment is made to multi-allelic CNV merging, in which copy states closer to reference are now preferred.

In addition, a tiebreaker has been added in the case of conflicting copy states that are equally far from reference, e.g. merging a variant with copy states 1 and 3 on a diploid chromosome. In this case when copy state qualities are also equal (which should be rare), deletion copy states are prioritized over duplication. That is, copy state 1 would be chosen in the example. This is justified by the higher accuracy observed in large depth-only deletions compared to duplications. Since there are no other criteria on which to assess multi-allelic CNVs, this extra comparison is needed for stability.

Copy link
Contributor

@epiercehoffman epiercehoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the update yesterday. Left a couple of minor questions but resolve as you see fit

GenotypeBuilder.create(
"sample",
Lists.newArrayList(Allele.NO_CALL, Allele.NO_CALL),
createGenotypeTestAttributesWithCNQ(2, 2, 30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we prefer a hom ref CN for MCNVs but non-ref for biallelic sites? That seems a bit inconsistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True is is inconsistent. I think one could argue for preferring the copy states farther from neutral, which would make the copy state distribution wider. Here I've chosen to, after considering RD quality, prefer copy states closer to neutral. This is just a more conservative prior. I don't think it makes sense, however, to have both het>hom-ref and het>hom-var for mCNVs since they are common by definition such that losing variants is not as problematic as for rare variants (which constitute most of the call set).

There are other technical reasons for wanting to prefer het over hom-ref with non-depth-only calls as well (e.g. breakpoint shifting on SR signatures), whereas mCNVs are usually depth-only.

@@ -644,25 +644,7 @@ public Object[][] collapseSampleGenotypesTestData() {
createGenotypeTestAttributes(2)
)
},
// het preferred over hom ref even with lower gq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be true right? I wasn't sure why this case got dropped, unless I didn't see one it's redundant with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added this back in

@mwalker174 mwalker174 merged commit fc248df into master Dec 3, 2024
20 checks passed
@mwalker174 mwalker174 deleted the mw_het_comparator branch December 3, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants