Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organizational and cleanup tweaks #82

Merged
merged 7 commits into from
Jun 28, 2024
Merged

Organizational and cleanup tweaks #82

merged 7 commits into from
Jun 28, 2024

Conversation

nwiltsie
Copy link
Member

Description

This PR performs a handful of re-organizational and cleanup tasks for the pipeline.

Sidecar file searching during config

I've added the following "new" required parameters to default.config and schema.yaml. They all have default values defined by other parameters, e.g. reference_fasta_fai = "${-> params.reference_fasta}.fai".

  • reference_fasta_fai
  • reference_fasta_dict
  • bundle_known_indels_vcf_gz_tbi
  • bundle_contest_hapmap_3p3_vcf_gz_tbi
  • bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi
  • bundle_v0_dbsnp138_vcf_gz_tbi

I say "new" parameters because the values are already in use across the pipeline - this change is to eagerly perform the discovery and validation of those files during the configuration phase. That's more DRY and ensures that missing files cause fast failures.

Optional parameters listed in README

I've updated the README to include all of the parameters from default.config in a second table.

PipeVal in the critical path

I've tweaked the early workflow processes to take the PipeVal validated output files rather than the raw BAM/BAI files (required uclahs-cds/pipeline-Nextflow-module#44). That trades a little efficiency for simplicity - the downstream processing can't begin in parallel with validation, but in exchange we don't need to worry about cancelling processes or clawing back invalid outputs due to a late validation result.

Input name harmonization

Many processes have proxy inputs for parameters - that is, those processes are always called with the same parameter as the same positional input. In those cases I renamed the input to match the parameter - for example, run_SplitIntervals_GATK's reference, reference_index, and reference_dict inputs are now reference_fasta, reference_fasta_index, and reference_fasta_dict.

Harmonizing the input name with the parameter name makes it easier to trace the logic and search the codebase.

Testing Results

  • NFTest
    • log: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/nwiltsie-refactor/log-nftest-20240627T155900Z.log
    • cases: default set

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.

@nwiltsie nwiltsie requested a review from a team June 27, 2024 18:19
@nwiltsie nwiltsie changed the title Nwiltsie refactor Organizational and cleanup tweaks Jun 27, 2024
Copy link

Bleep bloop, I am a robot.

Alas, some of the Nextflow configuration tests failed!

test/configtest-F16.json

@ ["params","bundle_contest_hapmap_3p3_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz.tbi"
@ ["params","bundle_known_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
@ ["params","bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"
@ ["params","bundle_v0_dbsnp138_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi"
@ ["params","reference_fasta_dict"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.dict"
@ ["params","reference_fasta_fai"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta.fai"

test/configtest-F32.json

@ ["params","bundle_contest_hapmap_3p3_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz.tbi"
@ ["params","bundle_known_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
@ ["params","bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"
@ ["params","bundle_v0_dbsnp138_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi"
@ ["params","reference_fasta_dict"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.dict"
@ ["params","reference_fasta_fai"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta.fai"

If the above changes are surprising, stop and determine what happened.

If the above changes are expected, there are two ways to fix this:

  1. Automatically: Post a comment starting with "/fix-tests" (without the quotes) and I will update the tests for you (you must review my work afterwards).
  2. Manually: Follow these steps on Confluence.

@nwiltsie
Copy link
Member Author

/fix-tests

Copy link

Bleep bloop, I am a robot.

I have updated all of the failing tests for you with 25a0b8e. You must review my work before merging this pull request!

Copy link

@tyamaguchi-ucla tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Looks good to me. @yashpatel6 for the final approval.

Copy link
Collaborator

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

config/default.config Show resolved Hide resolved
@nwiltsie nwiltsie merged commit e801e93 into main Jun 28, 2024
7 checks passed
@nwiltsie nwiltsie deleted the nwiltsie-refactor branch June 28, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants