You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current vcfGTcount.gawk script can be expanded to report not just the basic GT summaries, but also e.g. translated genotypes (TGT in the terminology of bcftools) or even IUPAC version (or IUPACGT in bcftools). For example:
-t = count translated genotypes
-i = count IUPAC-formated genotypes
-g = count numeric-style genotypes (default)
This would be handled by a function that gets called after extracting a genotype, using some if checking.
function translate(gt, ref, alt, iupac)
{
gsub(/0/, ref, gt)
gsub(/1/, alt, gt)
if (iupac == 1)
gt = iupacdict[gt] # needs a dict of iupac codes
return gt
}
It can be handled by a single function, but maybe more efficient would be to have two functions, so that the if (iupac == 1) is called once rather than on every genotype.
The text was updated successfully, but these errors were encountered:
After giving it some thought, I realized most of the classic approaches would hurt performance, even if the current functionality is desired. This is because of the extra if statements being called too often.
But since the script already uses gawk features, I can solve it with indirect function calls. Basically, I'd define multiple functions and select the one to be used at run time. All I need is to define the flags as above and then assign the appropriate function.
One thing to consider later would be combining multiple flags to get multiple stats in the output. But that can wait.
The current
vcfGTcount.gawk
script can be expanded to report not just the basic GT summaries, but also e.g. translated genotypes (TGT in the terminology ofbcftools
) or even IUPAC version (or IUPACGT inbcftools
). For example:-t
= count translated genotypes-i
= count IUPAC-formated genotypes-g
= count numeric-style genotypes (default)This would be handled by a function that gets called after extracting a genotype, using some
if
checking.It can be handled by a single function, but maybe more efficient would be to have two functions, so that the
if (iupac == 1)
is called once rather than on every genotype.The text was updated successfully, but these errors were encountered: