Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to review the step where we convert Graphtyper VCF outputs files to tabular format using vcf-to-tab #76

Open
masudermann opened this issue May 31, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@masudermann
Copy link
Contributor

masudermann commented May 31, 2024

Description of the bug

I've mentioned this before, but I don't think vcf-to-tab from VCFtools was designed to work with Graphtyper-formatted vcf files.

It works ok when dealing with haploid organisms where you are just filtering for homozygous SNPs and have haploid calls, but it in the conversion process to tabular format, output tables don't have any diploid calls (see screenshots below).

As a result, it also collapses any heterozygous SNPs.

I was reminded of this issue after Camilo used this tool with VCF files produced using GATK, and interestingly, all diploid calls were retained, as were heterozygous SNPs.

In the conversion to multi-fasta alignment, he then saw the expected ambiguity codes.

Let's discuss this more, but I think we may need to select another tool convert the vcf to tabular format, before creating the SNP-multifasta alignment files that are inputs into poppr and when making the SNP tree.

Finally, as I look at the VCF ouputs:

  1. In the current iteration of the pipeline, during variant filtering, there is a line where only homozygous SNPs are retained regardless of ploidy. We should discuss the implications of this.

  2. I think another flag needs to be added to retain only biallelic SNPs (if that is the goal).

Command used and terminal output

No response

Relevant files

Here is a snapshot of a bacterial output file generated using Graphtyper -> vcf-to-tab as part of a pipeline run

#CHROM POS REF GCF_002251695_1_ralstonia_solanacearum1 GCF_002251695_1_ralstonia_solanacearum2
NZ_NCTK01000001.1 28751 T G/ G/
NZ_NCTK01000001.1 28755 C T/ T/
NZ_NCTK01000001.1 28757 G A/ A/
NZ_NCTK01000001.1 28763 C G/ G/
NZ_NCTK01000001.1 28776 A G/ G/
NZ_NCTK01000001.1 28791 A G/ G/
NZ_NCTK01000001.1 28793 G T/ T/
NZ_NCTK01000001.1 28817 C C/ C/
NZ_NCTK01000001.1 28830 A A/ A/
NZ_NCTK01000001.1 28835 T T/ C/
NZ_NCTK01000001.1 28842 A G/ A/
NZ_NCTK01000001.1 28847 C A/ A/
NZ_NCTK01000001.1 28877 C T/ T/
NZ_NCTK01000001.1 28892 T C/ C/
NZ_NCTK01000001.1 28904 GA G/ GA/
NZ_NCTK01000001.1 28983 A G/ G/
NZ_NCTK01000001.1 29026 T C/ C/
NZ_NCTK01000001.1 29049 G G/ C/


Here is an oomycete output file generating using Graphtyoper -> vcf-to-tab (as part of same run)

#CHROM POS REF GCA_001466705_2_phytophthora_palmivora2
LATX02000001.1 1873814 G A/
LATX02000001.1 2111531 C T/
LATX02000007.1 1838 T C/
LATX02000012.1 104622 C ./
LATX02000013.1 24767 C T/
LATX02000013.1 665680 T G/
LATX02000017.1 2844 G A/
LATX02000017.1 55049 G A/
LATX02000020.1 52018 A G/
LATX02000020.1 52031 T C/
LATX02000024.1 498059 T A/
LATX02000046.1 457792 A ./
LATX02000046.1 457859 G T/
LATX02000046.1 457862 T C/
LATX02000046.1 457906 A C/
LATX02000046.1 457936 C ./
LATX02000046.1 457998 T C/
LATX02000057.1 373757 G A/
LATX02000057.1 406057 G C/
LATX02000057.1 442436 T C/
LATX02000078.1 731901 T C/

System information

No response

@masudermann masudermann added the bug Something isn't working label May 31, 2024
@masudermann masudermann changed the title We need to review step where we convert Graphtyper VCF outputs files to tabular format using vcf-to-tab We need to review the step where we convert Graphtyper VCF outputs files to tabular format using vcf-to-tab May 31, 2024
@cahuparo
Copy link
Contributor

I agree with @masudermann! To provide more context here is the tab file produced by vcf-to-tab using an original gatk vcf file.

Screen Shot 2024-05-31 at 13 18 16 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants