-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add genotype filtering Terra workflow configs and documentation #695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to add this documentation to the website, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll want everything on the website by the time we release the featured workspace. I think it makes sense to me to get the README and dashboard updated so we can update the template Terra workspace, then update the website after - does that sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing all of this up. Mostly looks good, though I flagged a few places to fix. In addition, the pipeline diagram should be updated to include these workflows.
* [JoinRawCalls](#join-raw-calls) - Merges unfiltered calls across batches | ||
* [SVConcordance](#svconcordance) - Calculates genotype concordance with raw calls | ||
* [FilterGenotypes](#filter-genotypes) - Performs genotype filtering | ||
* [AnnotateVcf](#annotate-vcf) - Functional and allele frequency annotation | ||
* [Module 09](#module09) - QC and Visualization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I will clean out the readme with other unrelated changes when I do a full readme->website transfer later.
Computes genotype concordance metrics between all variants in the joint call set and raw calls. | ||
|
||
#### Prerequisites: | ||
* [MakeCohortVcf](#make-cohort-vcf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably update the README with the four subsections of MakeCohortVcf and change this prereq to CleanVcf. At the very least, the MakeCohortVcf section should describe the subworkflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I think this is a bit beyond the scope of this PR but I am going to create a new ticket for the website update and will mention this.
"JoinRawCalls.clustered_depth_vcfs" : "${this.clustered_depth_vcf}", | ||
"JoinRawCalls.clustered_depth_vcf_indexes" : "${this.clustered_depth_vcf_index}", | ||
|
||
"JoinRawCalls.clustered_manta_vcfs" : "${this.clustered_manta_vcf}", | ||
"JoinRawCalls.clustered_manta_vcf_indexes" : "${this.clustered_manta_vcf_index}", | ||
|
||
"JoinRawCalls.clustered_wham_vcfs" : "${this.clustered_wham_vcf}", | ||
"JoinRawCalls.clustered_wham_vcf_indexes" : "${this.clustered_wham_vcf_index}", | ||
|
||
"JoinRawCalls.clustered_melt_vcfs" : "${this.clustered_melt_vcf}", | ||
"JoinRawCalls.clustered_melt_vcf_indexes" : "${this.clustered_melt_vcf_index}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"JoinRawCalls.clustered_depth_vcfs" : "${this.clustered_depth_vcf}", | |
"JoinRawCalls.clustered_depth_vcf_indexes" : "${this.clustered_depth_vcf_index}", | |
"JoinRawCalls.clustered_manta_vcfs" : "${this.clustered_manta_vcf}", | |
"JoinRawCalls.clustered_manta_vcf_indexes" : "${this.clustered_manta_vcf_index}", | |
"JoinRawCalls.clustered_wham_vcfs" : "${this.clustered_wham_vcf}", | |
"JoinRawCalls.clustered_wham_vcf_indexes" : "${this.clustered_wham_vcf_index}", | |
"JoinRawCalls.clustered_melt_vcfs" : "${this.clustered_melt_vcf}", | |
"JoinRawCalls.clustered_melt_vcf_indexes" : "${this.clustered_melt_vcf_index}", | |
"JoinRawCalls.clustered_depth_vcfs" : "${this.sample_sets.clustered_depth_vcf}", | |
"JoinRawCalls.clustered_depth_vcf_indexes" : "${this.sample_sets.clustered_depth_vcf_index}", | |
"JoinRawCalls.clustered_manta_vcfs" : "${this.sample_sets.clustered_manta_vcf}", | |
"JoinRawCalls.clustered_manta_vcf_indexes" : "${this.sample_sets.clustered_manta_vcf_index}", | |
"JoinRawCalls.clustered_wham_vcfs" : "${this.sample_sets.clustered_wham_vcf}", | |
"JoinRawCalls.clustered_wham_vcf_indexes" : "${this.sample_sets.clustered_wham_vcf_index}", | |
"JoinRawCalls.clustered_melt_vcfs" : "${this.sample_sets.clustered_melt_vcf}", | |
"JoinRawCalls.clustered_melt_vcf_indexes" : "${this.sample_sets.clustered_melt_vcf_index}", |
Can you cross-reference these JSONs with the ones in the existing Terra workspace to avoid issues like this? Or were they all configured for a single batch? In that case AoU Phase 2 could work for checking, though I hope I got them all
Also, probably should add scramble VCF inputs and double check that the workflow runs ok when an empty array is provided for one set of caller VCFs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This looks consistent with the AoU workspace and I've changed the default to Scramble even though that won't be default behavior until #722 goes in. I think this has been running with empty Scramble inputs so that shouldn't be a problem.
inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/SVConcordance.json.tmpl
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,41 @@ | |||
{ | |||
"FilterGenotypes.vcf": "${this.concordance_vcf}", | |||
"FilterGenotypes.output_prefix": "${this.sample_set_id}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"FilterGenotypes.output_prefix": "${this.sample_set_id}", | |
"FilterGenotypes.output_prefix": "${this.sample_set_set_id}", |
@@ -5,6 +5,7 @@ gatk_docker {{ dockers.gatk_docker }} | |||
gatk_docker_pesr_override {{ dockers.gatk_docker_pesr_override }} | |||
gcnv_gatk_docker {{ dockers.gatk_docker }} | |||
genomes_in_the_cloud_docker {{ dockers.genomes_in_the_cloud_docker }} | |||
gq_recalibrator_docker {{ dockers.gq_recalibrator_docker }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this merged with the other GATK docker soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely since this tool is still on a branch
inputs/templates/test/FilterGenotypes/FilterGenotypes.fixed_cutoffs.json.tmpl
Outdated
Show resolved
Hide resolved
Typo Typo 2 Add to overview Round out readme Typo 3 Add detail Minor changes Fix workspace table Update inputs Fix templates
7237b4a
to
05b8025
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments! This is looking good. We should also get these workflows added to the dockstore.yml and the template workspace - I can also do that in my upcoming ManualReview PR though.
README.md
Outdated
@@ -541,7 +541,7 @@ See the SV "Genotype Filter" section on page 34 of the [All of Us Genomic Qualit | |||
|
|||
All valid genotypes are annotated with a "scaled logit" (SL) score, which is rescaled to non-negative adjusted GQs on [1, 99]. Note that the rescaled GQs should *not* be interpreted as probabilities. Original genotype qualities are retained in the OGQ field. | |||
|
|||
A more positive SL score indicates higher probability of correctness of the given genotype. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). | |||
A more positive SL score indicates higher probability that the give genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more positive SL score indicates higher probability that the give genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). | |
A more positive SL score indicates higher probability that the given genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). |
Tested on our "bwa-melt" benchmarking workspace, which had a recent cleaned vcf to work from.