Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to manage a phased vcf input? #201

Open
ManuRodriguezCano opened this issue Nov 19, 2024 · 3 comments
Open

How to manage a phased vcf input? #201

ManuRodriguezCano opened this issue Nov 19, 2024 · 3 comments

Comments

@ManuRodriguezCano
Copy link

I have a vcf file with phases and I want to use it as input in Pharmcat, however when using it I see no differences with the file without phases. Do I need to manage it somehow or is there a specific parameter?

  • Please tell us about your environment:

    • PharmCAT Version: 2.15.5
    • Environment: [ Linux | Docker ]
@markwoon
Copy link
Contributor

It's really hard help you without more details.

What's the difference between the file with phases and the file without phases? Sample VCF with data on just one chromosome would be enough to help debug. Just make sure the data is de-identified.

@ManuRodriguezCano
Copy link
Author

Sorry for the time!
The main difference is that I used the WhatsHap tool, which is responsible for genotype phasing, i.e. assigning the correct alleles to each of the two homologous chromosomes in a sample, using information about the phases of genetic variants. WhatsHap focuses on reconstructing the haplotypes of the variants in a genome, which is particularly useful for complex variants or genes with multiple alleles. This allows for a more accurate representation of genetic variants and is essential in pharmacogenomic analysis.
The main difference between the phased and unphased files is that the phased file includes the | separator in the genotypes, indicating haplotype phasing.
Phased example:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
1 12345 rs123 A G . PASS . GT:DP 0|1:35
1 67890 rs456 C T . PASS . GT:DP 1|0:40

Unphased example:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
1 12345 rs123 A G . PASS . GT:DP 0/1:35
1 67890 rs456 C T . PASS . GT:DP 1/0:40

@markwoon
Copy link
Contributor

OK, but without seeing your actual data, it's hard to see if there actually is a problem.

If the sample is "effectively phased" (i.e. homozygous at all positions or all but 1 position), the result would be the same. Depending on your actual data, what you're seeing may be entirely reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants