broadinstitute · mwalker174 · Oct 21, 2024 · Jun 27, 2024 · Jul 2, 2024 · Sep 30, 2024
diff --git a/README.md b/README.md
@@ -164,7 +164,7 @@ The pipeline consists of a series of modules that perform the following:
 * [JoinRawCalls](#join-raw-calls): Merges unfiltered calls across batches
 * [SVConcordance](#svconcordance): Calculates genotype concordance with raw calls
 * [FilterGenotypes](#filter-genotypes): Performs genotype filtering
-* [AnnotateVcf](#annotate-vcf): Annotations, including functional annotation, allele frequency (AF) annotation and AF annotation with external population callsets;
+* [AnnotateVcf](#annotate-vcf): Annotations, including functional annotation, allele frequency (AF) annotation and AF annotation with external population callsets
 * [Module 09](#module09): Visualization, including scripts that generates IGV screenshots and rd plots.
 * Additional modules to be added: de novo and mosaic scripts 
 
@@ -483,8 +483,8 @@ Merges raw unfiltered calls across batches. Concordance between these genotypes
 * [ClusterBatch](#cluster-batch)
 
 #### Inputs:
-* Clustered Manta, Wham, Scramble, Melt, and/or depth VCF URIs ([ClusterBatch](#cluster-batch))
-* Ped file
+* Clustered Manta, Wham, depth, Scramble, and/or MELT VCF URIs ([ClusterBatch](#cluster-batch))
+* PED file
 * Reference sequence
 
 #### Outputs:
@@ -514,7 +514,7 @@ Performs genotype quality recalibration using a machine learning model based on
 The ML model uses the following features:
 
 * Genotype properties:
-  * Allele frequency (AF), no-call counts
+  * Non-reference and no-call allele counts
   * Genotype quality (GQ) 
   * Supporting evidence types (EV) and respective genotype qualities (PE_GQ, SR_GQ, RD_GQ)
   * Raw call concordance (CONC_ST)
@@ -541,7 +541,7 @@ See the SV "Genotype Filter" section on page 34 of the [All of Us Genomic Qualit
 
 All valid genotypes are annotated with a "scaled logit" (SL) score, which is rescaled to non-negative adjusted GQs on [1, 99]. Note that the rescaled GQs should *not* be interpreted as probabilities. Original genotype qualities are retained in the OGQ field. 
 
-A more positive SL score indicates higher probability of correctness of the given genotype. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). 
+A more positive SL score indicates higher probability that the give genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). 
-A more positive SL score indicates higher probability that the give genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). 
+A more positive SL score indicates higher probability that the given genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices).
-A more positive SL score indicates higher probability that the give genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices). 
+A more positive SL score indicates higher probability that the given genotype is not homozygous for the reference allele. Genotypes are therefore filtered using SL thresholds that depend on SV type and size. This workflow also generates QC plots using the [MainVcfQc](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/MainVcfQc.wdl) workflow to review call set quality (see below for recommended practices).
 
 This workflow can be run in one of two modes:
 
@@ -586,10 +586,10 @@ These criteria can be assessed from the plots in the `main_vcf_qc_tarball` outpu
 * Either a set of SL cutoffs or truth labels
 
 #### Outputs:
-* The filtered VCF
+* Filtered VCF
 * Call set QC plots (optional)
 * Optimized SL cutoffs with filtering QC plots and data tables (if running mode [2] with truth labels)
-* A copy of the VCF with only SL annotation and GQ recalibration (before filtering)
+* VCF with only SL annotation and GQ recalibration (before filtering)
 
 ## <a name="annotate-vcf">AnnotateVcf</a>
 *Formerly Module08Annotation*

diff --git a/inputs/templates/terra_workspaces/cohort_mode/cohort_mode_workspace_dashboard.md.tmpl b/inputs/templates/terra_workspaces/cohort_mode/cohort_mode_workspace_dashboard.md.tmpl
@@ -207,9 +207,9 @@ Read the full GenotypeBatch documentation [here](https://github.com/broadinstitu
 
 #### 11-RegenotypeCNVs, 12-CombineBatches, 13-ResolveComplexVariants, 14-GenotypeComplexVariants, 15-CleanVcf, 16-JoinRawCalls, 17-SVConcordance, 18-FilterGenotypes, and 19-AnnotateVcf
 
-Read the full documentation for [RegenotypeCNVs](https://github.com/broadinstitute/gatk-sv#regenotype-cnvs), [MakeCohortVcf](https://github.com/broadinstitute/gatk-sv#make-cohort-vcf) (which includes `CombineBatches`, `ResolveComplexVariants`, `GenotypeComplexVariants`, `CleanVcf`, `MainVcfQc`), and [AnnotateVcf](https://github.com/broadinstitute/gatk-sv#annotate-vcf) on the README.
+Read the full documentation for [RegenotypeCNVs](https://github.com/broadinstitute/gatk-sv#regenotype-cnvs), [MakeCohortVcf](https://github.com/broadinstitute/gatk-sv#make-cohort-vcf) (which includes `CombineBatches`, `ResolveComplexVariants`, `GenotypeComplexVariants`, `CleanVcf`), [`JoinRawCalls`](https://github.com/broadinstitute/gatk-sv#join-raw-calls), [`SVConcordance`](https://github.com/broadinstitute/gatk-sv#svconcordance), [`FilterGenotypes`](https://github.com/broadinstitute/gatk-sv#filter-genotypes), and [AnnotateVcf](https://github.com/broadinstitute/gatk-sv#annotate-vcf) on the README.
 * Use the same cohort `sample_set_set` you created and used for `09-MergeBatchSites`.
 
 #### Downstream steps
 
-Additional downstream steps are under development. Read about some of them on the README [here](https://github.com/broadinstitute/gatk-sv#module07).
+Additional downstream steps are under development.
diff --git a/.../templates/terra_workspaces/cohort_mode/workflow_configurations/FilterGenotypes.json.tmpl b/.../templates/terra_workspaces/cohort_mode/workflow_configurations/FilterGenotypes.json.tmpl
@@ -1,6 +1,6 @@
 {
   "FilterGenotypes.vcf": "${this.concordance_vcf}",
-  "FilterGenotypes.output_prefix": "${this.sample_set_id}",
+  "FilterGenotypes.output_prefix": "${this.sample_set_set_id}",
   "FilterGenotypes.ploidy_table": "${this.ploidy_table}",
   "FilterGenotypes.gq_recalibrator_model_file": "${workspace.recalibrate_gq_model_file}",
   "FilterGenotypes.sl_filter_args": "--small-del-threshold 93 --medium-del-threshold 150 --small-dup-threshold -51 --medium-dup-threshold -4 --ins-threshold -13 --inv-threshold -19",

diff --git a/inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/JoinRawCalls.json.tmpl b/inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/JoinRawCalls.json.tmpl
@@ -3,17 +3,17 @@
   "JoinRawCalls.sv_base_mini_docker": "${workspace.sv_base_mini_docker}",
   "JoinRawCalls.sv_pipeline_docker": "${workspace.sv_pipeline_docker}",
 
-  "JoinRawCalls.clustered_depth_vcfs" : "${this.clustered_depth_vcf}",
-  "JoinRawCalls.clustered_depth_vcf_indexes" : "${this.clustered_depth_vcf_index}",
+  "JoinRawCalls.clustered_depth_vcfs" : "${this.sample_sets.clustered_depth_vcf}",
+  "JoinRawCalls.clustered_depth_vcf_indexes" : "${this.sample_sets.clustered_depth_vcf_index}",
 
-  "JoinRawCalls.clustered_manta_vcfs" : "${this.clustered_manta_vcf}",
-  "JoinRawCalls.clustered_manta_vcf_indexes" : "${this.clustered_manta_vcf_index}",
+  "JoinRawCalls.clustered_manta_vcfs" : "${this.sample_sets.clustered_manta_vcf}",
+  "JoinRawCalls.clustered_manta_vcf_indexes" : "${this.sample_sets.clustered_manta_vcf_index}",
 
-  "JoinRawCalls.clustered_wham_vcfs" : "${this.clustered_wham_vcf}",
-  "JoinRawCalls.clustered_wham_vcf_indexes" : "${this.clustered_wham_vcf_index}",
+  "JoinRawCalls.clustered_wham_vcfs" : "${this.sample_sets.clustered_wham_vcf}",
+  "JoinRawCalls.clustered_wham_vcf_indexes" : "${this.sample_sets.clustered_wham_vcf_index}",
 
-  "JoinRawCalls.clustered_melt_vcfs" : "${this.clustered_melt_vcf}",
-  "JoinRawCalls.clustered_melt_vcf_indexes" : "${this.clustered_melt_vcf_index}",
+  "JoinRawCalls.clustered_scramble_vcfs" : "${this.sample_sets.clustered_scramble_vcf}",
+  "JoinRawCalls.clustered_scramble_vcf_indexes" : "${this.sample_sets.clustered_scramble_vcf_index}",
 
   "JoinRawCalls.FormatVcfForGatk.formatter_args": "--fix-end",
 
@@ -24,5 +24,5 @@
   "JoinRawCalls.reference_fasta_fai": "${workspace.reference_index}",
   "JoinRawCalls.reference_dict": "${workspace.reference_dict}",
 
-  "JoinRawCalls.prefix": "${this.sample_set_id}"
+  "JoinRawCalls.prefix": "${this.sample_set_set_id}"
 }
diff --git a/...ts/templates/terra_workspaces/cohort_mode/workflow_configurations/SVConcordance.json.tmpl b/...ts/templates/terra_workspaces/cohort_mode/workflow_configurations/SVConcordance.json.tmpl
@@ -5,7 +5,7 @@
   "SVConcordance.eval_vcf" : "${this.cleaned_vcf}",
   "SVConcordance.truth_vcf" : "${this.joined_raw_calls_vcf}",
 
-  "SVConcordance.output_prefix": "${this.sample_set_id}",
+  "SVConcordance.output_prefix": "${this.sample_set_set_id}",
 
   "SVConcordance.contig_list": "${workspace.primary_contigs_list}",
   "SVConcordance.reference_dict": "${workspace.reference_dict}"

diff --git a/inputs/templates/test/FilterGenotypes/FilterGenotypes.fixed_cutoffs.json.tmpl b/inputs/templates/test/FilterGenotypes/FilterGenotypes.fixed_cutoffs.json.tmpl
@@ -14,8 +14,8 @@
     "--min-samples-to-estimate-allele-frequency -1"
   ],
 
-  "FilterGenotypes.ped_file": "${workspace.cohort_ped_file}",
-  "FilterGenotypes.primary_contigs_fai": "${workspace.primary_contigs_fai}",
+  "FilterGenotypes.ped_file": {{ test_batch.ped_file | tojson }},
+  "FilterGenotypes.primary_contigs_fai": {{ reference_resources.primary_contigs_fai | tojson }},
   "FilterGenotypes.site_level_comparison_datasets": [
     {{ reference_resources.ccdg_abel_site_level_benchmarking_dataset | tojson }},
     {{ reference_resources.gnomad_v2_collins_site_level_benchmarking_dataset | tojson }},