Replace pangenome.vcf
with a presence-absence.vcf
as main output, but keep it to build the graph genomes
#30
Labels
enhancement
New feature or request
Replace
pangenome.vcf
with apresence-absence.vcf
in the3_TSD_Search/
output folder. This new file will show 1 genotype column per sample but the calls are only 1 or 0 (i.e. identical to the SUPP_VEC field). We still need to outputpangenome.vcf
for compatibility with the option--graffite-vcf
(skips SV search and annotation, and use the VCF provided to build graph and map reads). Alternatively, don't outputpangenome.vcf
, but keep it internally to build the graph if needed. This would require to modify the routines for--graffite-vcf
in order to strip the genotype column and replace them with a single column with all variants1|0
.I anticipate a possible source of confusion as "presence-absence" could be interpreted as the presence or absence of a TE rather than presence/absence of the variant. Perhaps a solution to this is to output two files, one in VCF format, respecting the VCF convention and called
GraffiTE_variants_presence-absence.vcf
and the other being tsv table, identical to the non-header lines of the VCF but where the DEL calls are reverted to match the presence/absence pattern of the TEs for each sample. We could call this fileGraffiTE_TE_presence-absence.tsv
.Of course, will need to update the documentation accordingly.
This change has several advantages:
vcf.txt
file from in order to know which position of the SUPP_VEC correspond to which sample.The text was updated successfully, but these errors were encountered: