Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-1549 Add VAT to integration tests #9085

Merged
merged 33 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
b7e44b1
Testing framework.
gbggrant Jan 6, 2025
db98831
A little clean up
gbggrant Jan 6, 2025
d11cb2b
Add some required inputs
gbggrant Jan 7, 2025
7b9cb64
Defined the scatter count
gbggrant Jan 7, 2025
ea3c082
Minor documentation updated
gbggrant Jan 17, 2025
de4a19b
Try to run the test with a sites only VCF as an alternate input
gbggrant Jan 17, 2025
6324fe5
Adding validation of the VDS and size check on db table
gbggrant Jan 23, 2025
ad8bbfc
Fix two bugs in validation
gbggrant Jan 23, 2025
bfbcb7c
Forgot to pass that parameter
gbggrant Jan 23, 2025
9e991e0
Trying to fix call to GvsValidateVAT.wdl
gbggrant Jan 24, 2025
fba4bd7
Trying to fix call to GvsValidateVAT.wdl
gbggrant Jan 24, 2025
879599e
Added size check.
gbggrant Jan 24, 2025
ebf544e
A little more checking.
gbggrant Jan 24, 2025
e87c588
Add VAT integration test to main integration test
gbggrant Jan 27, 2025
05cdc13
Add branch to .dockstore.yml
gbggrant Jan 27, 2025
747ca56
Fix wdl syntax error
gbggrant Jan 27, 2025
9a673f8
Need to define workspace_id and submission_id
gbggrant Jan 27, 2025
57d2e77
Remove comment
gbggrant Jan 27, 2025
505e5f2
Merge remote-tracking branch 'origin/ah_var_store' into gg_VS-1549_Ad…
gbggrant Jan 27, 2025
1e7f0f9
Stuff for final work
gbggrant Jan 27, 2025
99dc3bd
Stuff for final work
gbggrant Jan 27, 2025
8f56aa8
Update scripts/variantstore/variant-annotations-table/GvsValidateVAT.wdl
gbggrant Jan 28, 2025
f9fd042
Address code review comments
gbggrant Jan 29, 2025
7a7256d
Merge remote-tracking branch 'origin/ah_var_store' into gg_VS-1549_Ad…
gbggrant Jan 29, 2025
31210ec
debugging comment!
gbggrant Jan 29, 2025
26fc0f0
debugging comment!!!
gbggrant Jan 29, 2025
53385a7
Fix disk sizing error
gbggrant Jan 30, 2025
7929086
debugging comment
gbggrant Jan 30, 2025
bb95780
debugging comment
gbggrant Jan 30, 2025
4efd1f6
debugging comment..
gbggrant Jan 30, 2025
b6cdfed
Use standard variants docker
gbggrant Jan 31, 2025
ecf4e2b
Remove debugging comment
gbggrant Jan 31, 2025
97acc25
Added a comment, removed branch from .dockstore.yml
gbggrant Feb 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,17 @@ workflows:
branches:
- master
- ah_var_store
- gg_VS-1549_AddVATToIntegrationTests
tags:
- /.*/
- name: GvsQuickstartVATIntegration
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/test/GvsQuickstartVATIntegration.wdl
filters:
branches:
- master
- ah_var_store
- gg_VS-1549_AddVATToIntegrationTests
tags:
- /.*/
- name: GvsIngestTieout
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ workflow GvsCreateVATfromVDS {

# If the vat version is undefined or v1 then the vat tables would be named like filter_vat, otherwise filter_vat_v2.
String effective_vat_version = if (defined(vat_version) && select_first([vat_version]) != "v1") then "_" + select_first([vat_version]) else ""
String vat_table_name = filter_set_name + "_vat" + effective_vat_version
String effective_vat_table_name = filter_set_name + "_vat" + effective_vat_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not "qualified" ? whats the goal of this? maybe we should just pass the name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed vat_table_name to be an output of the wdl, and couldn't name an internal variable with the same name, so I changed it to effective_vat_table_name - using the pattern of a lot of naming nearby.


String output_path_without_a_trailing_slash = sub(output_path, "/$", "")
String effective_output_path = if (output_path == output_path_without_a_trailing_slash) then output_path + "/" else output_path
Expand Down Expand Up @@ -262,7 +262,7 @@ workflow GvsCreateVATfromVDS {
project_id = project_id,
dataset_name = dataset_name,
output_path = effective_output_path,
base_vat_table_name = vat_table_name,
base_vat_table_name = effective_vat_table_name,
prep_vt_json_done = PrepVtAnnotationJson.done,
prep_genes_json_done = PrepGenesAnnotationJson.done,
cloud_sdk_docker = effective_cloud_sdk_docker,
Expand All @@ -271,7 +271,7 @@ workflow GvsCreateVATfromVDS {
call DeduplicateVatInBigQuery {
input:
input_vat_table_name = BigQueryLoadJson.vat_table,
output_vat_table_name = vat_table_name,
output_vat_table_name = effective_vat_table_name,
nirvana_schema = MakeSubpopulationFilesAndReadSchemaFiles.vat_schema_json_file,
project_id = project_id,
dataset_name = dataset_name,
Expand All @@ -294,6 +294,7 @@ workflow GvsCreateVATfromVDS {
}

output {
String vat_table_name = effective_vat_table_name
String? cluster_name = GenerateSitesOnlyVcf.cluster_name
File? dropped_sites_file = MergeTsvs.output_file
File? final_tsv_file = GvsCreateVATFilesFromBigQuery.final_tsv_file
Expand Down
32 changes: 18 additions & 14 deletions scripts/variantstore/variant-annotations-table/GvsValidateVAT.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ workflow GvsValidateVat {
String project_id
String dataset_name
String vat_table_name
Boolean? is_small_callset
String? cloud_sdk_docker
String? variants_docker
}
Expand All @@ -25,20 +26,23 @@ workflow GvsValidateVat {
String effective_cloud_sdk_docker = select_first([cloud_sdk_docker, GetToolVersions.cloud_sdk_docker])
String effective_variants_docker = select_first([variants_docker, GetToolVersions.variants_docker])

call Utils.GetBQTableLastModifiedDatetime as SampleDateTime {
input:
project_id = project_id,
fq_table = fq_vat_table,
cloud_sdk_docker = effective_cloud_sdk_docker,
}
# Definining is_small_callset allows us to run this WDL on a dataset that has not had samples loaded (for testing)
gbggrant marked this conversation as resolved.
Show resolved Hide resolved
if (!defined(is_small_callset)) {
call Utils.GetBQTableLastModifiedDatetime as SampleDateTime {
input:
project_id = project_id,
fq_table = fq_sample_table,
cloud_sdk_docker = effective_cloud_sdk_docker,
}

call Utils.GetNumSamplesLoaded {
input:
fq_sample_table = fq_sample_table,
project_id = project_id,
sample_table_timestamp = SampleDateTime.last_modified_timestamp,
control_samples = false,
cloud_sdk_docker = effective_cloud_sdk_docker,
call Utils.GetNumSamplesLoaded {
input:
fq_sample_table = fq_sample_table,
project_id = project_id,
sample_table_timestamp = SampleDateTime.last_modified_timestamp,
control_samples = false,
cloud_sdk_docker = effective_cloud_sdk_docker,
}
}

call Utils.GetBQTableLastModifiedDatetime as VatDateTime {
Expand Down Expand Up @@ -153,7 +157,7 @@ workflow GvsValidateVat {
}

# only check certain things if the callset is larger than 10,000 samples (a guess)
Boolean callset_is_small = GetNumSamplesLoaded.num_samples < 10000
Boolean callset_is_small = select_first([is_small_callset, select_first([GetNumSamplesLoaded.num_samples, 1]) < 10000])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment would be helpful or am I just WDL illiterate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a comment. Basically just checking if the is_small_callset flag is set, if not use the previous logic (where it's considered a small callset if there are less than 10000 samples).

if (!callset_is_small) {
call ClinvarSignificance {
input:
Expand Down
68 changes: 48 additions & 20 deletions scripts/variantstore/wdl/test/GvsQuickstartIntegration.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ version 1.0

import "GvsQuickstartVcfIntegration.wdl" as QuickstartVcfIntegration
import "GvsQuickstartHailIntegration.wdl" as QuickstartHailIntegration
import "GvsQuickstartVATIntegration.wdl" as QuickstartVATIntegration
import "../GvsJointVariantCalling.wdl" as JointVariantCalling
import "../GvsUtils.wdl" as Utils

Expand All @@ -14,6 +15,8 @@ workflow GvsQuickstartIntegration {
Boolean run_exome_integration = true
Boolean run_beta_integration = true
Boolean run_bge_integration = true
Boolean run_vat_integration = true
Boolean run_vat_integration_test_from_vds = true # If false, will use sites-only VCF
String sample_id_column_name = "sample_id"
String vcf_files_column_name = "hg38_reblocked_gvcf"
String vcf_index_files_column_name = "hg38_reblocked_gvcf_index"
Expand All @@ -25,6 +28,7 @@ workflow GvsQuickstartIntegration {
String? cloud_sdk_docker
String? cloud_sdk_slim_docker
String? variants_docker
String? variants_nirvana_docker
String? gatk_docker
String? hail_version
Boolean chr20_X_Y_only = true
Expand Down Expand Up @@ -52,6 +56,7 @@ workflow GvsQuickstartIntegration {
String effective_cloud_sdk_docker = select_first([cloud_sdk_docker, GetToolVersions.cloud_sdk_docker])
String effective_cloud_sdk_slim_docker = select_first([cloud_sdk_slim_docker, GetToolVersions.cloud_sdk_slim_docker])
String effective_variants_docker = select_first([variants_docker, GetToolVersions.variants_docker])
String effective_variants_nirvana_docker = select_first([variants_nirvana_docker, GetToolVersions.variants_nirvana_docker])
String effective_gatk_docker = select_first([gatk_docker, GetToolVersions.gatk_docker])
String effective_hail_version = select_first([hail_version, GetToolVersions.hail_version])

Expand All @@ -72,6 +77,10 @@ workflow GvsQuickstartIntegration {
}
}

String workspace_bucket = GetToolVersions.workspace_bucket
String workspace_id = GetToolVersions.workspace_id
String submission_id = GetToolVersions.submission_id
gbggrant marked this conversation as resolved.
Show resolved Hide resolved

# Note for `GvsQuickstartIntegration` we use the git_branch_or_tag *input* and its corresponding git hash. This is not
# necessarily the same as the branch name selected in Terra for the integration `GvsQuickstartIntegration` workflow,
# though in practice likely they are the same.
Expand All @@ -98,9 +107,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
hail_version = effective_hail_version,
maximum_alternate_alleles = maximum_alternate_alleles,
}
Expand Down Expand Up @@ -137,9 +146,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
maximum_alternate_alleles = maximum_alternate_alleles,
}
call QuickstartVcfIntegration.GvsQuickstartVcfIntegration as QuickstartVcfVQSRIntegration {
Expand All @@ -164,9 +173,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
maximum_alternate_alleles = maximum_alternate_alleles,
}

Expand Down Expand Up @@ -210,9 +219,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
maximum_alternate_alleles = maximum_alternate_alleles,
target_interval_list = target_interval_list,
}
Expand Down Expand Up @@ -249,9 +258,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
maximum_alternate_alleles = maximum_alternate_alleles,
target_interval_list = target_interval_list,
}
Expand All @@ -268,8 +277,6 @@ workflow GvsQuickstartIntegration {
if (run_beta_integration) {
String project_id = "gvs-internal"

String workspace_bucket = GetToolVersions.workspace_bucket
String submission_id = GetToolVersions.submission_id
String extract_output_gcs_dir = "~{workspace_bucket}/output_vcfs/by_submission_id/~{submission_id}/beta"
Boolean collect_variant_calling_metrics = true

Expand All @@ -296,9 +303,9 @@ workflow GvsQuickstartIntegration {
cloud_sdk_docker = effective_cloud_sdk_docker,
variants_docker = effective_variants_docker,
gatk_docker = effective_gatk_docker,
workspace_bucket = GetToolVersions.workspace_bucket,
workspace_id = GetToolVersions.workspace_id,
submission_id = GetToolVersions.submission_id,
workspace_bucket = workspace_bucket,
workspace_id = workspace_id,
submission_id = submission_id,
maximum_alternate_alleles = maximum_alternate_alleles,
git_branch_or_tag = git_branch_or_tag,
sample_id_column_name = sample_id_column_name,
Expand All @@ -317,6 +324,27 @@ workflow GvsQuickstartIntegration {
}
}

if (run_vat_integration) {
String extract_vat_output_gcs_dir = "~{workspace_bucket}/output_vat/by_submission_id/~{submission_id}/vat"

call QuickstartVATIntegration.GvsQuickstartVATIntegration as GvsQuickstartVATIntegration {
gbggrant marked this conversation as resolved.
Show resolved Hide resolved
input:
git_branch_or_tag = git_branch_or_tag,
git_hash = GetToolVersions.git_hash,
use_default_dockers = use_default_dockers,
expected_output_prefix = expected_output_prefix,
dataset_suffix = "vat",
output_path = extract_vat_output_gcs_dir,
use_vds = run_vat_integration_test_from_vds,
basic_docker = effective_basic_docker,
cloud_sdk_docker = effective_cloud_sdk_docker,
cloud_sdk_slim_docker = effective_cloud_sdk_slim_docker,
variants_docker = effective_variants_docker,
variants_nirvana_docker = effective_variants_nirvana_docker,
gatk_docker = effective_gatk_docker,
}
}

output {
String recorded_git_hash = GetToolVersions.git_hash
}
Expand Down
Loading
Loading