Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_validate_PipeVal_with_metadata method #44

Merged
merged 5 commits into from
Jun 14, 2024

Conversation

nwiltsie
Copy link
Member

This PR adds a new run_validate_PipeVal_with_metadata method. That method is functionally equivalent to run_validate_PipeVal except that it takes a tuple as input, rather than a single path, and emits the same thing as validated_file:

run_validate_PipeVal:

input:
path(file_to_validate)
output:
path(".command.*")
path("validation.txt"), emit: validation_result
path(file_to_validate), emit: validated_file

run_validate_PipeVal_with_metadata:

input:
tuple path(file_to_validate), val(metadata)
output:
path(".command.*")
path("validation.txt"), emit: validation_result
tuple path(file_to_validate), val(metadata), emit: validated_file

With that change, the validated_file output channel (with a little manipulation) can be fed to downstream processes directly. As an example, take this logic from pipeline-recalibrate-BAM (abbreviated):

workflow {
    Channel.from(params.samples_to_process)
        .map{ sample -> ['index': indexFile(sample.path)] + sample }
        .set{ input_ch_samples_with_index }

    input_ch_samples_with_index
        .map{ sample -> [sample.path, sample.index] }
        .flatten()
        .set{ input_ch_validate }

    // Outputs of this process are not used anywhere
    run_validate_PipeVal(input_ch_validate)

    // This process is not gated by run_validate_PipeVal
    input_ch_samples_with_index
        .filter{ it.sample_type == 'normal' }
        .map{ it -> [sanitize_string(it.id)] }
        .join(run_GetPileupSummaries_GATK.out.pileupsummaries)
        .set{ normal_pileupsummaries }

    ...
}

input_ch_samples_with_index.dump():

['index':/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam.bai, 'id':'S2_v1.1.5', 'path':'/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam', 'sample_type':'tumor']

That can be changed to this (tested but not reviewed/merged):

workflow {
    Channel.from(params.samples_to_process)
        .flatMap { sample ->
            def all_metadata = sample.findAll { it.key != "path" }
            return [
                [sample.path, [all_metadata, "path"]],
                [indexFile(sample.path), [[id: sample.id], "index"]]
            ]
        } | run_validate_PipeVal_with_metadata

    run_validate_PipeVal_with_metadata.out.validated_file
        .map { filename, metadata -> [metadata[0].id, metadata[0] + [(metadata[1]): filename]] }
        .groupTuple()
        .map { it[1].inject([:]) { result, i -> result + i } }
        .set { validated_samples_with_index }

    // This process is now downstream from PipeVal
    validated_samples_with_index
        .filter{ it.sample_type == 'normal' }
        .map{ it -> [sanitize_string(it.id)] }
        .join(run_GetPileupSummaries_GATK.out.pileupsummaries)
        .set{ normal_pileupsummaries }
    
    ...
}

validated_samples_with_index.dump():

['id':'S2_v1.1.5', 'sample_type':'tumor', 'path':/scratch/174611/ee/64c24ef90214d88bcd0bfe7c5376af/S2.T-n1.bam, 'index':/scratch/174611/f5/9083727eee12f7456acaed9b5bea48/S2.T-n1.bam.bai]
  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the
    metadata.yaml and the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

Testing Results

Modified pipeline-recalibrate-BAM NFTest run: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/nwiltsie-refactor/log-nftest-20240613T202121Z.log

@nwiltsie nwiltsie requested a review from a team June 13, 2024 21:38
@nwiltsie
Copy link
Member Author

@yashpatel6 I've incorporated #43, and updated the new function with the same changes.

Copy link
Collaborator

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@nwiltsie nwiltsie merged commit 537da32 into main Jun 14, 2024
5 checks passed
@nwiltsie nwiltsie deleted the nwiltsie-pipeval-with-metadata branch June 14, 2024 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants