generated from d3b-center/d3b-bixu-template
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from d3b-center/feature/public-release
Feature/public release
- Loading branch information
Showing
9 changed files
with
226 additions
and
86 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Kids First Loss of Heterozygosity (LOH) | ||
|
||
![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) | ||
|
||
Preprocessing LOH assesses the loss of heterozygosity (LOH) in the tumor for rare germline variants. | ||
Order of operations: This workflow runs bcftools to extract data and prepare list of locations from vcf provided and feed it into bam-readcount to compute VAF and later parse, merge germline and tumor data together. | ||
|
||
## Inputs | ||
|
||
- BS_ID : Sample id for proband | ||
- frequency: cut off for rare germline variants based on gnomad_3_1_1_AF_popmax tag | ||
- ram_germline : provide ram for germline tool which is directly related with size of VCF file provide in sample_vcf_file | ||
- peddy_file: details about the family trio | ||
- participant_id: provide participant_id for proband. It is used just to name the output file. | ||
- bamscrams: provide multiple bam/cram files for proband and family trios | ||
- reference: human reference file in fasta format | ||
- sample_vcf_file: vcf file to exact germline calls | ||
- minDepth: minimum depth of the reads that should be considered in the tunor analysis | ||
- bamcramsampleIDs: provide bam/cram sampleids in the same order as provided in bamscrams. | ||
- ram_tumor: ram for tumor tool, which is strongly connected with the size of the cram files and number of cram files provided | ||
- mincore: provide number of processor. Each cram will be split into 32 parts for multiprocessing and results will be merge back. High number of processors are recommended. | ||
|
||
## Output | ||
|
||
- output_file: a tsv file with mapped variant data from germline and tumor tool containing germline VAF and tumor VAF | ||
|
||
## Demo Proband-only Cavatica Task | ||
|
||
![LOH schematic](https://github.com/d3b-center/tumor-loh-app-dev/blob/master/docs/logo/proband_run.png) | ||
|
||
## Demo Family-trio Cavatica Task | ||
|
||
![LOH schematic](https://github.com/d3b-center/tumor-loh-app-dev/blob/master/docs/logo/proband_run.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
119 changes: 119 additions & 0 deletions
119
workflow/kf-loss-of-heterozygosity-preprocessing-wf.cwl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
#!/usr/bin/env cwl-runner | ||
|
||
cwlVersion: v1.2 | ||
class: Workflow | ||
id: kf-loss-of-heterozygosity-preprocessing | ||
label: Kids First Loss of Heterozygosity | ||
doc: | | ||
# Kids First Loss of Heterozygosity (LOH) | ||
|
||
![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) | ||
|
||
The Kids First Loss of Heterozygosity Preprocessing (aka LOH) is a CWL workflow that assesses the loss of heterozygosity in the tumor for rare germline calls filtered by gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. This preprocessing is designed to compute variant allele frequency (VAF) for multiple proband tumor samples and can also map germline VAF for family trios if trio germline VCF file is provided. | ||
|
||
#### Basic info | ||
- Dockerfile: https://github.com/d3b-center/bixtools/tree/master/LOH/1.0.1 | ||
- tested with | ||
- Seven Bridges Cavatica Platform: https://cavatica.sbgenomics.com/ | ||
- cwltool: https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20221201130942 | ||
|
||
### Description | ||
|
||
The Kids First Loss of Heterozygosity application is divided into two tools: Germline tool and Tumor tool. | ||
|
||
#### Germline Tool | ||
|
||
Germline tool filters germline annotations to retain variants based on gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. It requires vcf file, proband sample id, ram as required inputs and peddy file as optional input which is required for family trios. It outputs variant information such as gene, chr, start, stop, ref/alt alleles, ref/alt allele depths, variant allele frequency and list of coordinates that will be an input to tumor tool. | ||
|
||
#### Tumor Tool | ||
Tumor tool search in paired proband tumor sample for aligned reads in the regions where rare variants from the germline tool exists and exact allele/reference count, allele/reference depth and calculate the variant allele frequency VAF. Tumor tool have the capability to search multiple tumor samples for proband and if applicable, parental and maternal tumor samples. To exact reads from the bam/cram files, this tool utilizes [bam-readcount](https://github.com/genome/bam-readcount) and wraps it with python script to shape the output in a tabular format. | ||
|
||
### LOH Inputs | ||
``` | ||
Germline tool | ||
# Required | ||
BS_ID: { doc: provide BS id for germline normal,type: string } | ||
frequency: { doc: provide popmax cutoff for rare germline variants, type: 'float?', default: 0.01 } | ||
# Optional | ||
ram_germline: { doc: Provide ram (in GB) based on the size of vcf,type: 'int?', default: 8} | ||
# Required for family trios otherwise not required | ||
peddy_file: { doc: provide ped file for the trio, type: 'File?' } | ||
Tumor tool | ||
# Required | ||
participant_id: { doc: provide participant id for this run, type: string } | ||
bamscrams: { doc: tumor input file in cram or bam format with their index file, type: 'File[]' , secondaryFiles: [ { pattern: ".crai", required: false }, { pattern: ".bai", required: false } ] } | ||
reference: { doc: human reference in fasta format with index file, type: File, secondaryFiles: [ .fai ], "sbg:suggestedValue": { class: File, path: 60639014357c3a53540ca7a3, name: Homo_sapiens_assembly38.fasta, secondaryFiles: [{class: File, path: 60639016357c3a53540ca7af, name: Homo_sapiens_assembly38.fasta.fai}]} } | ||
sample_vcf_file: { doc: provide germline vcf file for this sample, type: File } | ||
# Optional | ||
minDepth: { doc: provide minDepth to consider for tumor reads, type: 'int?', default: 1 } | ||
bamcramsampleIDs: { doc: provide unique identifers (in the same order) for cram/bam files provided under bamcrams tag. Default is sample ID pulled from bam/cram files., type: 'string[]?' } | ||
ram_tumor: { doc: Provide ram (in GB) for tumor tool based on the number cram/bam inputs, type: 'int?', default: 16} | ||
minCore: { type: 'int?', default: 16, doc: "Minimum number of cores for tumor tool based on the number cram/bam inputs" } | ||
``` | ||
|
||
### LOH Output | ||
|
||
LOH application will output a tab-separated values file mapped data from germline tool and tumor tool. | ||
``` | ||
output_file: { type: File, doc: A tsv file with gathered data from germline and tumor tool} | ||
``` | ||
|
||
requirements: | ||
- class: StepInputExpressionRequirement | ||
|
||
inputs: | ||
BS_ID: { doc: provide BS id for germline normal,type: string } | ||
participant_id: { doc: provide participant id for this run, type: string } | ||
frequency: { doc: provide popmax cutoff for rare germline variants, type: 'float?', default: 0.01 } | ||
peddy_file: { doc: provide ped file for the trio, type: 'File?' } | ||
bamscrams: { doc: tumor input file in cram or bam format with their index file, type: 'File[]' , secondaryFiles: [ { pattern: ".crai", required: false }, { pattern: ".bai", required: false } ] } | ||
minDepth: { doc: provide minDepth to consider for tumor reads, type: 'int?', default: 1 } | ||
reference: { doc: human reference in fasta format with index file, type: File,secondaryFiles: [ .fai ] } | ||
sample_vcf_file: { doc: provide germline vcf file for this sample, type: File } | ||
bamcramsampleIDs: { doc: provide unique identifers (in the same order) for cram/bam files provided under bamcrams tag. Default is sample ID pulled from bam/cram files., type: 'string[]?' } | ||
ram_germline: { doc: Provide ram (in GB) based on the size of vcf,type: 'int?', default: 8} | ||
ram_tumor: { doc: Provide ram (in GB) size and number of cram/bam inputs, type: 'int?', default: 16} | ||
minCore: { type: 'int?', default: 16, doc: "Minimum number of cores for tumor tool" } | ||
outputs: | ||
output_file: { type: File, doc: output file from LOH app, outputSource: run_tumor_tool/loh_output_file_tool } | ||
|
||
steps: | ||
run_germline_tool: | ||
run: ../tools/run_gene_extract_list_prepare.cwl | ||
in: | ||
bs_id: BS_ID | ||
sample_vcf_file_tool: sample_vcf_file | ||
frequency_tool: frequency | ||
peddy_file_tool: peddy_file | ||
ram: ram_germline | ||
out: | ||
[ output_file_1_tool,output_file_2_tool,log_output] | ||
run_tumor_tool: | ||
run: ../tools/run_readcount_parser.cwl | ||
in: | ||
participant_id: participant_id | ||
germline_file: run_germline_tool/output_file_1_tool | ||
list_dir: run_germline_tool/output_file_2_tool | ||
minDepth: minDepth | ||
reference: reference | ||
patientbamcrams : bamscrams | ||
peddy: peddy_file | ||
bamcramsampleID: bamcramsampleIDs | ||
ram: ram_tumor | ||
minCore: minCore | ||
out: | ||
[ loh_output_file_tool,log_output ] | ||
$namespaces: | ||
sbg: https://sevenbridges.com | ||
"sbg:license": Apache License 2.0 | ||
"sbg:publisher": KFDRC | ||
"sbg:categories": | ||
- VAF | ||
- LOH | ||
- WGS | ||
- WXS | ||
- GVCF | ||
- TRIOS | ||
"sbg:links": | ||
- id: 'https://github.com/d3b-center/tumor-loh-app-dev/releases/tag/v1.0.2' | ||
label: github-release |
Oops, something went wrong.