You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used samtools and bcftools as well as bwa to generate VCF file.
These tools are pretty much well established and it's strange to see that they generate VCF file that I can't read with VariantCallFormat. I made a check with pyvcf 0.6.8 and I could read the file without any problems.
It appears problem was with CHROM field which contained this NC_000014.8:106641561-106641856 while POS was 1 instead of absolute position.
After writing small function that fixes first column to be NC_000014.8 and second to be 106641561+1 I could read the file.
However, accessing record.pos gives me some range which is not even correct. While it should give the position.
I checked other VCF file which didn't have any issue and is available online, and again pos returns wrong value
v[1]
VariantCallFormat.Record:
chromosome: NC_000001.10
position: 10001
identifier: rs1570391677
reference: T
alternate: A C
quality: <missing>
filter: <missing>
information: RS=1570391677 dbSNPBuildID=154 SSR=0 PSEUDOGENEINFO=DDX11L1:100287102 VC=SNV R5 GNO FREQ=KOREAN:0.9891,0.0109,.|SGDP_PRJ:0,1,.|dbGaP_PopFreq:1,.,0 COMMON
format: <missing>
julia> v[1].pos
14:18
whereas in Python using PyVCF record.pos gives me rightfully 10001 value.
In [2]: import vcf
In [3]: vcf_reader = vcf.Reader(open('test.vcf','r'))
In [4]: r = [r for r in vcf_reader][1]
In [5]: r.POS
Out[5]: 10002
Again this is public dbSNP dataset, something very established.
This all suggests that VariantCallFormat (VariantCallFormat v0.5.5 according to Pkg.status) is broken, so as suggested I am filing a bug report.
Hope it helps to solve the problem? Do you have any tips for temporal walk-around? Or is writing custom parser the quickest ?
The text was updated successfully, but these errors were encountered:
Use VCF.pos(r) to access the position. r.POS is an internal field related to the parsing.
I'm not at a computer right now and I cannot investigate the other issue right now, but if the file is incorrectly formatted (according to the VCF specification) I think that the best option is to write a small function to fix the problem.
Files that are not formatted according to the specification is a big problem in bioinformatics. But I much prefer a strict parser where I get an error and fix the problem I'm the file manually, than one that continues silently and maybe parsed it incorrectly.
After reading the specification, I've come to the conclusion that colon should be allowed in the CHROM field. I don't have much bandwidth at the moment but I'll see if I can get a fix in.
Hi,
I used
samtools
andbcftools
as well asbwa
to generate VCF file.These tools are pretty much well established and it's strange to see that they generate VCF file that I can't read with VariantCallFormat. I made a check with pyvcf 0.6.8 and I could read the file without any problems.
It appears problem was with CHROM field which contained this
NC_000014.8:106641561-106641856
while POS was 1 instead of absolute position.After writing small function that fixes first column to be
NC_000014.8
and second to be106641561+1
I could read the file.However, accessing
record.pos
gives me some range which is not even correct. While it should give the position.I checked other VCF file which didn't have any issue and is available online, and again
pos
returns wrong valuewhereas in Python using PyVCF record.pos gives me rightfully 10001 value.
Again this is public dbSNP dataset, something very established.
This all suggests that VariantCallFormat (VariantCallFormat v0.5.5 according to Pkg.status) is broken, so as suggested I am filing a bug report.
Hope it helps to solve the problem? Do you have any tips for temporal walk-around? Or is writing custom parser the quickest ?
The text was updated successfully, but these errors were encountered: