You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SAMPLE information for a VariantRecord is written out in the insertion order of the sample dict of the first sample populated for the record. This can lead to two samples in the same VCF having different orders for their FORMAT column which is somewhat unintuitive, although I am not clear if it's an actual bug or against spec.
Here is an MRE:
frompysamimportVariantFile, VariantHeader, VariantRecordheader=VariantHeader()
header.contigs.add('chr1')
header.formats.add("GT", "1", "String", "Genotype")
header.formats.add("DP", "1", "Integer", "Read depth in the tumor BAM")
header.formats.add("AF", "1", "Float", "Estimated allele frequency in the tumor BAM")
header.add_samples(['sample1', 'sample2'])
SAMPLE1= {"GT": (1,0), "DP": 10, "AF": 0.5}
SAMPLE2= {"GT": (1,0), "DP": 10, "AF": 0.5}
rec1=header.new_record(
contig='chr1',
id='formats_in_order',
start=0,
stop=1,
alleles=('T', 'A')
)
fork, vinSAMPLE1.items():
rec1.samples['sample1'][k] =vfork, vinSAMPLE2.items():
rec1.samples['sample2'][k] =vrec2=header.new_record(
contig='chr1',
id='first_sample_not_in_order',
start=0,
stop=1,
alleles=('T', 'A')
)
# Note how we insert the sample values in reverse orderfork, vinlist(SAMPLE1.items())[::-1]:
rec2.samples['sample1'][k] =vfork, vinSAMPLE2.items():
rec2.samples['sample2'][k] =vrec3=header.new_record(
contig='chr1',
id='second_sample_not_in_order',
start=0,
stop=1,
alleles=('T', 'A')
)
fork, vinlist(SAMPLE2.items())[::-1]:
rec3.samples['sample2'][k] =vfork, vinSAMPLE1.items():
rec3.samples['sample1'][k] =vtestfile=VariantFile('-', 'w', header=header)
testfile.write(rec1)
testfile.write(rec2)
testfile.write(rec3)
testfile.close()
##fileformat=VCFv4.2##FILTER=<ID=PASS,Description="All filters passed">##contig=<ID=chr1>##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth in the tumor BAM">##FORMAT=<ID=AF,Number=1,Type=Float,Description="Estimated allele frequency in the tumor BAM">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2chr11formats_in_orderTA . . . GT:DP:AF1/0:10:0.51/0:10:0.5chr11first_sample_not_in_orderTA . . . GT:AF:DP1/0:0.5:101/0:0.5:10chr11second_sample_not_in_orderTA . . . GT:AF:DP1/0:0.5:101/0:0.5:10
The text was updated successfully, but these errors were encountered:
The
SAMPLE
information for aVariantRecord
is written out in the insertion order of the sample dict of the first sample populated for the record. This can lead to two samples in the same VCF having different orders for theirFORMAT
column which is somewhat unintuitive, although I am not clear if it's an actual bug or against spec.Here is an MRE:
The text was updated successfully, but these errors were encountered: