-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated variants in a pangenome VCF #1493
Comments
You can try merging them with |
I have tried to use
But there are 2 problems:
I was planning to do something like:
merge into
If you think this strategy is OK to handle duplicated variants in MC output VCF, I can write some script to do it. Many thanks. |
That's disappointing about |
I just wrote a prototype python script. You can find it here It is now designed for phased VCF only:
For below VCF:
It output
I also tested the current script on HPRCv1.1 chr22:
I am still considering how to handle duplicate variants with conflict genotypes:
For duplicate variants with 1|0 vs 1|0, do you think these are really redundantly called by cactus or different variants with the same VCF representation due to the limitation of VCF? |
Wow, this looks really promising, thanks for sharing! I will take a closer look when I have a free minute (which unfortunately won't be for a few days), but am very interested in incorporating it if possible into cactus. |
Hi,
In the output VCF of MC pipeline, there are a few duplicated variants after left-algin. Below is one example from the HPRC VCF:
Before left-align (
bcftools view -r chr22:15409352-15409435 hprc-v1.1-mc-grch38.vcfbub.a100k.wave.vcf.gz
):After
bcftools norm -f ref.fa
, these 2 insertions are exactly the same:Does is mean these 2 are the same insertion?
As the genotypes of these 2 insertions are different, to remove duplicated records from the VCF, should I keep one of them (like
bcftools norm --rm-dup exact
) or merge their genotypes (i.e., 2 AC=1 records merge into 1 AC=2 record)?Thanks a lot!
The text was updated successfully, but these errors were encountered: