-
Notifications
You must be signed in to change notification settings - Fork 23
groups in slivar
groups allow a user to indicate alias
es so that a single expression can be applied to many groups of samples.
A simple example would be that we have 3 families, each with a mom, dad, proband, and unaffected sibling. Given sample ids of s1..s12 that appear in the vcf, we could create an alias
file like:
#proband dad mom sibling
s1 s2 s3 s4
s5 s6 s7 s8
s9 s10 s11 s12
where the headers indicate the labels that we can use in a --group-expr
. Then a --group-expr might look like:
--group-expr "denovo:mom.alts == 0 && dad.alts == 0 && sibling.alts == 0 \ # all unaffecteds are hom-ref
&& proband.alts == 1 \ # proband is heterozygous
&& mom.AD[1] == 0 && dad.AD[1] == 0 && sibling.AD[1] == 0 \ ## make sure no alternate alleles are seen in unaffecteds
&& kid.AB > 0.2 && kib.AB < 0.8 \ # make sure the allele balance is reasonable
&& INFO.gnomad_popmax_af < 1e-3 \ # variant must be rare in gnomad
This would add an INFO field of denovo=$proband
to any variant that matches this criteria. The first column, in this case proband
is used as the entry in the INFO field. Note that these labels are for human-readability, only, they can be whatever the user choose, for example, the above header could instead be:
#affected mom dad unaffected
if that makes the expressions more readable.
For somatic variants, the intuitive labels may be "tumor" and "normal", or for 4 patients, each with 3 tumor time-points, a file make look like:
#normal tumor1 tumor2 tumor3
s1n s1t1 s1t2 s1t3
s2n s2t1 s2t2 s2t3
s3n s3t1 s3t2 s3t3
s4n s4t1 s4t2 s4t3
Then, to find somatic variants that increase in allele frequency across the tumor time-points, we can specify an expression like:
--group-expr "increasing:normal.alts == 0 && normal.AD[1] == 0 \ # no evidence in normal
&& tumor1.AB > 0 && tumor2.AB > tumor1.AB && tumor3.AB > tumor2.AB
this will create a new INFO field increasing
and it will have the list of normal
(first column) samples that met that criteria for each variant.
For pedigrees with 3 generations, we may want to find *de novos in the F1 that are transmitted to the F2. The --alias
file for this might look like:
#f1 spouse gma gpa kids
s1 s2 s3 s4 s5,s6,s7
s8 s9 s10 s11 s12,s13
s14 s15 s16,s17,s18,s19,s20,s21,s22,s23,s24
note that any column that ends with s
will be available in the expression as a javascript array and multiple samples can be specified by commas. So, in this case, there are multiple kids
and each family has a different number of kids.
An expression for this might look like:
--group-expr "transmitted:f1.alts == 1 && f1.AB > 0.2 && f1.AB < 0.8 && \
&& gma.alts == 0 && gpa.alts == 0 \ # must be absent in the parents of f1 to be denovo
&& spouse.alts == 0 \ # make sure the variant did not come from spouse.
&& proportion_kids_with_alt(kids) > 0.25"
So, here, we have specified a de novo in f1
that must appear in at least 25% of its offspring.