-
Notifications
You must be signed in to change notification settings - Fork 23
tsv: creating a spreadsheet from a filtered VCF
Brent Pedersen edited this page Jun 27, 2019
·
4 revisions
slivar
provides flexible filtering of VCFs. But when doing a final variant-by-variant analysis,
it's preferable to have the data in a spreadsheet--for clinicians and analysts.
slivar tsv
enables this.
In order to get these VCFs into a spreadsheet format that a clinician might use, one can use the slivar tsv
subcommand.
This command can also use the gene annotations from VEP or bcftools and add other annotations using the gene name. For example, we can create a gene -> pLI lookup with this command:
wget -qO - https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz \
| zcat \
| cut -f 1,21,24 | tail -n+2 \
| awk '{ printf("%s\tpLI=%.3g;oe_lof=%.5g\n", $1, $2, $3)}' > pli.lookup
The slivar tsv
command allows specifying many of these gene -> value lookups. For example, it's often useful to have the gene description:
wget -qO - ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/gene_condition_source_id \
| cut -f 2,5 \
| grep -v ^$'\t' > clinvar_gene_desc.txt
slivar tsv \
-s denovo \ # indicate which INFO fields were added in previous slivar commands
-s x_denovo \
-s recessive \
-s x_recessive \
# any info fields to add
-i gnomad_popmax_af -i gnomad_popmax_af_filter -i gnomad_nhomalt \
# or CSQ if VEP was used
-c BCSQ \
# this will lookup the pLI and description using the gene and add a column for each
-g pli.lookup \
-g clinvar_gene_desc.txt \
-p $ped \
vcfs/$cohort.vcf > $cohort-variants.tsv
# repeat for compound-hets VCF
slivar tsv \
-s slivar_comphet \
-i gnomad_popmax_af -i gnomad_popmax_af_filter -i gnomad_nhomalt \
-c BCSQ \
-g pli.lookup \
-g clinvar_gene_desc.txt \
-p $ped \
vcfs/$cohort.ch.vcf > $cohort-compound-hets.tsv
these 2 files will contain the same columns so they can be concatenated as needed.