Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_validate_PipeVal_with_metadata method #44

Merged
merged 5 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- Add options to handle compressed files with VCF indexing workflow
- Add `bgzip` to `index_VCF_tabix` module
- Add PipeVal generate-checksum module
- Add `run_validate_PipeVal_with_metadata` method

### Changed
- Use `ghcr.io/uclahs-cds` as default registry
Expand Down
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,23 +116,35 @@ Outputs:

##### Description

Module for validating files and directories using PipeVal
Module for validating files and directories using PipeVal. There are two nearly-identical methods in this module: `run_validate_PipeVal` and `run_validate_PipeVal_with_metadata`.

Tools used: `PipeVal`.

Inputs:
- `file_to_validate`: path for file or directory to validate

Inputs:
- `run_validate_PipeVal`:
- `file_to_validate`: path for file to generate a checksum
- `run_validate_PipeVal_with_metadata` Inputs:
- A tuple of:
- `file_to_validate`: path for file to generate a checksum
- `metadata`: arbitrary `val` passed through to the output

Parameters:
- `log_output_dir`: directory for storing log files
- `docker_image_version`: PipeVal docker image version within which process will run. The default is: `4.0.0-rc.2`
- `process_label`: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- `main_process`: Set output directory to the specified main process instead of `PipeVal-4.0.0-rc.2`

Outputs:
- `validation_result`: path of file with validation output text
- `validated_file`: `file_to_validate` or tuple of (`file_to_validate`, `metadata`)

##### How to use

1. Add this repository as a submodule in the pipeline of interest
2. Include the `run_validate_PipeVal` process from the module `main.nf` with a relative path
2. Include the `run_validate_PipeVal` or `run_validate_PipeVal_with_metadata` process from the module `main.nf` with a relative path
3. Use the `addParams` directive when importing to specify any params
4. Call the process with the inputs where needed
5. Aggregate and save the output validation files as needed
Expand Down
49 changes: 49 additions & 0 deletions modules/PipeVal/validate/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,52 @@ process run_validate_PipeVal {
fi
"""
}

/**
* Nextflow module for validating files and directories.
*
* This variant accepts and emits a tuple so that the validated path can be
* associated with arbitrary metadata.
*
* @input file_to_validate path File or directory to validate
* @input metadata val Arbitrary metadata associated with the value.
*
* @params log_output_dir path Directory for saving log files
* @params docker_image_version string Version of PipeVal image for validation
* @params main_process string (Optional) Name of main output directory
*/
process run_validate_PipeVal_with_metadata {
container options.docker_image
label options.process_label

publishDir path: { options.main_process ?
"${options.log_output_dir}/process-log/${options.main_process}" :
"${options.log_output_dir}/process-log/PipeVal-${options.docker_image_version}"
},
pattern: ".command.*",
mode: "copy",
saveAs: { "${task.process.split(':')[-1]}/${task.process.split(':')[-1]}-${task.index}/log${file(it).getName()}" }

// This process uses the publishDir method to save the log files
ext capture_logs: false

input:
tuple path(file_to_validate), val(metadata)

output:
path(".command.*")
path("validation.txt"), emit: validation_result
tuple path(file_to_validate), val(metadata), emit: validated_file

script:
"""
set -euo pipefail

if command -v pipeval &> /dev/null
then
pipeval validate ${file_to_validate} ${options.validate_extra_args} > 'validation.txt'
else
validate ${file_to_validate} ${options.validate_extra_args} > 'validation.txt'
fi
"""
}