diff --git a/README.md b/README.md index 1572193..f770dcb 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,7 @@ ![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) -The Kids First Loss of Heterozygosity (aka LOH) is a CWL workflow that assesses the loss of heterozygosity in the tumor for rare germline calls filtered by gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. This workflow is designed to analyze LOH for family trios as well as multiple proband tumor samples. - -### Application Description - -The Kids First Loss of Heterozygosity application is divided into two tools: Germline tool and Tumor tool. +The Kids First Loss of Heterozygosity Preprocessing (aka LOH) is a CWL workflow that assesses the loss of heterozygosity in the tumor for rare germline calls filtered by gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. This preprocessing is designed to compute variant allele frequency (VAF) for multiple proband tumor samples and can also map germline VAF for family trios if trio germline VCF file is provided. #### Basic info - Dockerfile: https://github.com/d3b-center/bixtools/tree/master/LOH/1.0.1 @@ -14,6 +10,10 @@ The Kids First Loss of Heterozygosity application is divided into two tools: Ger - Seven Bridges Cavatica Platform: https://cavatica.sbgenomics.com/ - cwltool: https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20221201130942 +### Application Description + +The Kids First Loss of Heterozygosity application is divided into two tools: Germline tool and Tumor tool. + #### Germline Tool Germline tool filters germline annotations to retain variants based on gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. It requires vcf file, proband sample id, ram as required inputs and peddy file as optional input which is required for family trios. It outputs variant information such as gene, chr, start, stop, ref/alt alleles, ref/alt allele depths, variant allele frequency and list of coordinates that will be an input to tumor tool. @@ -57,7 +57,7 @@ output_file:{ type: File, doc: A tsv file with gathered data from germline and t #### Output headers -LOH workflow will generate a tab-separated values file with following headers: +Preprocessing LOH will generate a tab-separated values file with following headers: | Headers | Description | |:-------:|:--------:| | BS_ID | Sample Id for germline sample | @@ -88,7 +88,7 @@ More information can be found [here](https://github.com/d3b-center/tumor-loh-app ### Running it locally on a laptop? -It is recommended to run this workflow on a system with a high number of CPUs and memory (>=16 GB). The basic requirement is a running docker engine and CWL tools. Command line to run the LOH workflow locally is: +It is recommended to run this CWL workflow on a system with a high number of CPUs and memory (>=16 GB). The basic requirement is a running docker engine and CWL tools. Command line to run the LOH workflow locally is: ``` cwltool workflow/run_LOH_app.cwl sample_input.yml diff --git a/docs/README.md b/docs/README.md index 0dad64f..4778fa5 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,7 @@ ![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) -LOH workflow assesses the loss of heterozygosity (LOH) in the tumor for rare germline variants. +Preprocessing LOH assesses the loss of heterozygosity (LOH) in the tumor for rare germline variants. Order of operations: This workflow runs bcftools to extract data and prepare list of locations from vcf provided and feed it into bam-readcount to compute VAF and later parse, merge germline and tumor data together. ## Inputs diff --git a/tools/run_gene_extract_list_prepare.cwl b/tools/run_gene_extract_list_prepare.cwl index ef63e57..029ca14 100644 --- a/tools/run_gene_extract_list_prepare.cwl +++ b/tools/run_gene_extract_list_prepare.cwl @@ -3,7 +3,7 @@ cwlVersion: v1.2 class: CommandLineTool id: run_gene_extract_list_prepare -label: run_germline +label: germline_tool doc: collects info from bcftool and add gene, prepare directory of list for bam-readcount tool requirements: - class: ShellCommandRequirement diff --git a/tools/run_readcount_parser.cwl b/tools/run_readcount_parser.cwl index f5fae67..b13b17a 100644 --- a/tools/run_readcount_parser.cwl +++ b/tools/run_readcount_parser.cwl @@ -3,6 +3,7 @@ cwlVersion: v1.2 class: CommandLineTool id: run_readcount_parser +label: tumor_tool doc: run bam-readcountparser the bam-readcount output and extract inner join between data from vcf (germline) and cram/bam file (tumor) requirements: - class: ShellCommandRequirement diff --git a/workflow/run_LOH_app.cwl b/workflow/run_LOH_app.cwl index b4a5a93..12f7457 100644 --- a/workflow/run_LOH_app.cwl +++ b/workflow/run_LOH_app.cwl @@ -2,22 +2,25 @@ cwlVersion: v1.2 class: Workflow -id: run_LOH_app +id: kf-loss-of-heterozygosity-preprocessing label: Kids First Loss of Heterozygosity doc: | # Kids First Loss of Heterozygosity (LOH) ![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) - The Kids First Loss of Heterozygosity (aka LOH) is a CWL workflow that assesses the loss of heterozygosity in the tumor for rare germline calls filtered by gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. This workflow is designed to analyze LOH for family trios as well as multiple proband tumor samples. + The Kids First Loss of Heterozygosity Preprocessing (aka LOH) is a CWL workflow that assesses the loss of heterozygosity in the tumor for rare germline calls filtered by gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. This preprocessing is designed to compute variant allele frequency (VAF) for multiple proband tumor samples and can also map germline VAF for family trios if trio germline VCF file is provided. + + #### Basic info + - Dockerfile: https://github.com/d3b-center/bixtools/tree/master/LOH/1.0.1 + - tested with + - Seven Bridges Cavatica Platform: https://cavatica.sbgenomics.com/ + - cwltool: https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20221201130942 ### Description The Kids First Loss of Heterozygosity application is divided into two tools: Germline tool and Tumor tool. - #### Docker - Dockerfile: https://github.com/d3b-center/bixtools/tree/master/LOH - #### Germline Tool Germline tool filters germline annotations to retain variants based on gnomad_3_1_1_AF_popmax (typically < 0.01) or when gnomad_3_1_1_AF_popmax is not defined. It requires vcf file, proband sample id, ram as required inputs and peddy file as optional input which is required for family trios. It outputs variant information such as gene, chr, start, stop, ref/alt alleles, ref/alt allele depths, variant allele frequency and list of coordinates that will be an input to tumor tool. @@ -39,7 +42,7 @@ doc: | # Required participant_id: { doc: provide participant id for this run, type: string } bamscrams: { doc: tumor input file in cram or bam format with their index file, type: 'File[]' , secondaryFiles: [ { pattern: ".crai", required: false }, { pattern: ".bai", required: false } ] } - reference: { doc: human reference in fasta format with index file, type: File,secondaryFiles: [ .fai ],"sbg:suggestedValue": {class: File, path: 60639014357c3a53540ca7a3, name: Homo_sapiens_assembly38.fasta} } + reference: { doc: human reference in fasta format with index file, type: File, secondaryFiles: [ .fai ], "sbg:suggestedValue": { class: File, path: 60639014357c3a53540ca7a3, name: Homo_sapiens_assembly38.fasta, secondaryFiles: [{class: File, path: 60639016357c3a53540ca7af, name: Homo_sapiens_assembly38.fasta.fai}]} } sample_vcf_file: { doc: provide germline vcf file for this sample, type: File } # Optional minDepth: { doc: provide minDepth to consider for tumor reads, type: 'int?', default: 1 } @@ -104,10 +107,12 @@ steps: "sbg:license": Apache License 2.0 "sbg:publisher": KFDRC "sbg:categories": -- DNA +- VAF +- LOH - WGS - WXS - GVCF +- TRIOS "sbg:links": - id: 'https://github.com/d3b-center/tumor-loh-app-dev/releases/tag/v1.0.2' label: github-release \ No newline at end of file