-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
285 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#!/usr/bin/env nextflow | ||
|
||
process runClairs { | ||
|
||
label 'clairs' | ||
tag "${sample_id}" | ||
debug true | ||
|
||
publishDir( | ||
path: "${output_dir}/", | ||
mode: 'copy' | ||
) | ||
|
||
input: | ||
tuple val(sample_id), path(tumor_bam_file), path(tumor_bam_bai_file), path(normal_bam_file), path(normal_bam_bai_file) | ||
path(reference_genome_fasta_file) | ||
path(reference_genome_fasta_fai_file) | ||
val(params_clairs) | ||
val(output_dir) | ||
|
||
output: | ||
tuple val(sample_id), path("${sample_id}_clairs_outputs/"), emit: f | ||
|
||
script: | ||
""" | ||
mkdir -p ${sample_id}_clairs_outputs/ | ||
run_clairs \ | ||
--tumor_bam $tumor_bam_file \ | ||
--normal_bam $normal_bam_file \ | ||
--ref_fn $reference_genome_fasta_file \ | ||
--output_dir ${sample_id}_clairs_outputs/ \ | ||
--threads ${task.cpus} \ | ||
$params_clairs | ||
""" | ||
} |
79 changes: 79 additions & 0 deletions
79
...xuslib/pipelines/variant_calling/long_read_dna_variant_calling_clairs/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
## long_read_dna_variant_calling_clairs.nf | ||
|
||
Identifies somatic small DNA variants in tumor and normal long-read DNA BAM files using [ClairS](https://github.com/HKU-BAL/ClairS). | ||
|
||
### Inputs / Outputs | ||
|
||
| I/O | Description | | ||
|:-------|:----------------------------------------------| | ||
| Input | Tumor and normal `bam` files for each sample. | | ||
| Output | `vcf` file for each sample. | | ||
|
||
### Dependencies | ||
|
||
* `ClairS` | ||
|
||
### Example | ||
|
||
``` | ||
nexus run --nf-workflow long_read_dna_variant_calling_clairs.nf \ | ||
-c NEXTFLOW_CONFIG_FILE \ | ||
-w WORK_DIR \ | ||
--samples_tsv_file SAMPLES_TSV_FILE \ | ||
--output_dir OUTPUT_DIR \ | ||
--reference_genome_fasta_file REFERENCE_GENOME_FASTA_FILE \ | ||
--reference_genome_fasta_fai_file REFERENCE_GENOME_FASTA_FAI_FILE \ | ||
--params_clairs '"--platform hifi_revio"' | ||
``` | ||
|
||
### Usage | ||
|
||
``` | ||
workflow: | ||
1. Run ClairS. | ||
usage: nexus run --nf-workflow long_read_dna_variant_calling_clairs.nf [required] [optional] [--help] | ||
required arguments: | ||
-c : Nextflow .config file. | ||
-w : Nextflow work directory path. | ||
--samples_tsv_file : TSV file with the following columns: 'sample_id', 'tumor_bam_file', 'tumor_bam_bai_file', 'normal_bam_file', 'normal_bam_bai_file'. | ||
--output_dir : Directory to which output files will be copied. | ||
optional arguments: | ||
--reference_genome_fasta_file : Reference genome FASTA file (default: /datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa). | ||
--reference_genome_fasta_fai_file : Reference genome FASTA file (default: /datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa.fai). | ||
--params_clairs : ClairS parameters (default: '"--platform hifi_revio"'). | ||
Note that the parameters need to be wrapped in quotes. | ||
--delete_work_dir : Delete work directory (default: false). | ||
``` | ||
|
||
### Parameters | ||
|
||
`-c` | ||
* Nextflow config file can be downloaded [here](https://github.com/pirl-unc/nexus/tree/main/nextflow) | ||
|
||
`--sample_tsv_file` | ||
|
||
| Header | Description | | ||
|---------------------|------------------------------------| | ||
| sample_id | Sample ID | | ||
| tumor_bam_file | Full path to tumor `bam` file | | ||
| tumor_bam_bai_file | Full path to tumor `bam.bai` file | | ||
| normal_bam_file | Full path to normal `bam` file | | ||
| normal_bam_bai_file | Full path to normal `bam.bai` file | | ||
|
||
`--reference_genome_fasta_file` | ||
* Reference genome FASTA files can be found in /datastore/lbcfs/collaborations/pirl/seqdata/references/ on LBG. | ||
|
||
`--reference_genome_fasta_fai_file` | ||
* Reference genome FASTA.FAI files can be found in /datastore/lbcfs/collaborations/pirl/seqdata/references/ on LBG. | ||
|
||
`--params_clairs` | ||
* Refer to the [ClairS documentation](https://github.com/HKU-BAL/ClairS). | ||
* The following parameters for `run_clairs` are already included in `nexus` module for `clairs` and should not be specified: | ||
* `--tumor_bam` | ||
* `--normal_bam` | ||
* `--ref_fn` | ||
* `--output_dir` | ||
* `--threads` |
109 changes: 109 additions & 0 deletions
109
...iant_calling/long_read_dna_variant_calling_clairs/long_read_dna_variant_calling_clairs.nf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
#!/usr/bin/env nextflow | ||
|
||
nextflow.enable.dsl=2 | ||
|
||
// Step 1. Import Nextflow modules | ||
include { runClairs } from '../../modules/clairs' | ||
|
||
// Step 2. Input arguments | ||
params.help = '' | ||
// Required arguments | ||
params.samples_tsv_file = '' | ||
params.output_dir = '' | ||
// Optional arguments | ||
params.reference_genome_fasta_file = '/datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa' | ||
params.reference_genome_fasta_fai_file = '/datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa.fai' | ||
params.params_clairs = '--platform hifi_revio' | ||
params.delete_work_dir = false | ||
|
||
if (params.params_clairs == true) { | ||
params_clairs = '' | ||
} else { | ||
params_clairs = params.params_clairs | ||
} | ||
|
||
// Step 3. Print inputs and help | ||
log.info """\ | ||
================================================================================== | ||
Identify somatic small variants in long-read DNA sequencing BAM files using ClairS | ||
================================================================================== | ||
""".stripIndent() | ||
|
||
if (params.help) { | ||
log.info"""\ | ||
workflow: | ||
1. Run ClairS. | ||
usage: nexus run --nf-workflow long_read_dna_variant_calling_clairs.nf [required] [optional] [--help] | ||
required arguments: | ||
-c : Nextflow .config file. | ||
-w : Nextflow work directory path. | ||
--samples_tsv_file : TSV file with the following columns: 'sample_id', 'tumor_bam_file', 'tumor_bam_bai_file', 'normal_bam_file', 'normal_bam_bai_file'. | ||
--output_dir : Directory to which output files will be copied. | ||
optional arguments: | ||
--reference_genome_fasta_file : Reference genome FASTA file (default: /datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa). | ||
--reference_genome_fasta_fai_file : Reference genome FASTA file (default: /datastore/lbcfs/collaborations/pirl/seqdata/references/hg38.fa.fai). | ||
--params_clairs : ClairS parameters (default: '"--platform hifi_revio"'). | ||
Note that the parameters need to be wrapped in quotes. | ||
--delete_work_dir : Delete work directory (default: false). | ||
""".stripIndent() | ||
exit 0 | ||
} else { | ||
log.info"""\ | ||
samples_tsv_file : ${params.samples_tsv_file} | ||
output_dir : ${params.output_dir} | ||
reference_genome_fasta_file : ${params.reference_genome_fasta_file} | ||
reference_genome_fasta_fai_file : ${params.reference_genome_fasta_fai_file} | ||
params_clairs : ${params_clairs} | ||
delete_work_dir : ${params.delete_work_dir} | ||
""".stripIndent() | ||
} | ||
|
||
// Step 4. Set channels | ||
Channel | ||
.fromPath( params.samples_tsv_file ) | ||
.splitCsv( header: true, sep: '\t' ) | ||
.map { row -> tuple( | ||
"${row.sample_id}", | ||
"${row.tumor_bam_file}", | ||
"${row.tumor_bam_bai_file}", | ||
"${row.normal_bam_file}", | ||
"${row.normal_bam_bai_file}") } | ||
.set { input_bam_files_ch } | ||
|
||
// Step 5. Workflow | ||
workflow LONG_READ_DNA_VARIANT_CALLING_CLAIRS { | ||
take: | ||
input_bam_files_ch // channel: [val(sample_id), path(tumor_bam_file), path(tumor_bam_bai_file), path(normal_bam_file), path(normal_bam_bai_file)] | ||
reference_genome_fasta_file | ||
reference_genome_fasta_fai_file | ||
params_clairs | ||
output_dir | ||
|
||
main: | ||
runClairs( | ||
input_bam_files_ch, | ||
reference_genome_fasta_file, | ||
reference_genome_fasta_fai_file, | ||
params_clairs, | ||
output_dir | ||
) | ||
} | ||
|
||
workflow { | ||
LONG_READ_DNA_VARIANT_CALLING_CLAIRS( | ||
input_bam_files_ch, | ||
params.reference_genome_fasta_file, | ||
params.reference_genome_fasta_fai_file, | ||
params_clairs, | ||
params.output_dir | ||
) | ||
} | ||
|
||
workflow.onComplete { | ||
if ( params.delete_work_dir == true || params.delete_work_dir == 1 ) { | ||
workflow.workDir.deleteDir() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
test/variant_calling/test_long_read_dna_variant_calling_clairs.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import pandas as pd | ||
import os | ||
from nexuslib.main import run_workflow | ||
from ..data import get_data_path | ||
|
||
|
||
def test_long_read_dna_variant_calling_clairs(): | ||
nextflow_config_file = get_data_path(name='nextflow/nextflow_test_docker.config') | ||
tumor_dna_bam_file = get_data_path(name='bam/hg38_tp53_tumor_long_read_dna.bam') | ||
tumor_dna_bam_bai_file = get_data_path(name='bam/hg38_tp53_tumor_long_read_dna.bam.bai') | ||
normal_dna_bam_file = get_data_path(name='bam/hg38_tp53_normal_long_read_dna.bam') | ||
normal_dna_bam_bai_file = get_data_path(name='bam/hg38_tp53_normal_long_read_dna.bam.bai') | ||
reference_genome_fasta_file = get_data_path(name='fasta/hg38_chr17_1-8000000.fa') | ||
reference_genome_fasta_fai_file = get_data_path(name='fasta/hg38_chr17_1-8000000.fa.fai') | ||
temp_dir = os.getcwd() + '/tmp' | ||
intermediate_dir = temp_dir + '/intermediate/test_long_read_dna_variant_calling_clairs' | ||
work_dir = temp_dir + '/work/test_long_read_dna_variant_calling_clairs' | ||
output_dir = temp_dir + '/outputs/test_long_read_dna_variant_calling_clairs' | ||
if not os.path.exists(intermediate_dir): | ||
os.makedirs(intermediate_dir) | ||
if not os.path.exists(work_dir): | ||
os.makedirs(work_dir) | ||
if not os.path.exists(output_dir): | ||
os.makedirs(output_dir) | ||
pd.DataFrame({ | ||
'sample_id': ['tumor'], | ||
'tumor_bam_file': [tumor_dna_bam_file], | ||
'tumor_bam_bai_file': [tumor_dna_bam_bai_file], | ||
'normal_bam_file': [normal_dna_bam_file], | ||
'normal_bam_bai_file': [normal_dna_bam_bai_file] | ||
}).to_csv(intermediate_dir + "/samples.tsv", sep='\t', index=False) | ||
workflow_args = [ | ||
'-c', nextflow_config_file, | ||
'-w', work_dir, | ||
'--samples_tsv_file', intermediate_dir + '/samples.tsv', | ||
'--reference_genome_fasta_file', reference_genome_fasta_file, | ||
'--reference_genome_fasta_fai_file', reference_genome_fasta_fai_file, | ||
'--params_clairs', '"--platform hifi_revio"', | ||
'--output_dir', output_dir, | ||
] | ||
run_workflow(workflow='long_read_dna_variant_calling_clairs.nf', | ||
nextflow='nextflow', | ||
workflow_args=workflow_args) |