Skip to content

Commit

Permalink
Merge pull request #223 from uclahs-cds/sfitz-update-readme
Browse files Browse the repository at this point in the history
Sfitz update readme
  • Loading branch information
sorelfitzgibbon authored Aug 19, 2023
2 parents 58e717f + 652c459 commit ae5698f
Show file tree
Hide file tree
Showing 21 changed files with 415 additions and 142 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add `split_VCF_bcftools` to `Mutect2` workflow, separating SNVs, MNVs and Indels

### Changed
- Update `README.md`
- Use `set_env` from `pipeline-Nextflow-config`
- Update resource allocation to include new processes
- Reconfigure `intersect_regions` to use all contigs except `decoy`
Expand Down
188 changes: 89 additions & 99 deletions README.md

Large diffs are not rendered by default.

67 changes: 67 additions & 0 deletions docs/flowcharts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# pipeline-call-sSNV Flow Diagrams and Tool Links

- [Variant Calling](#variant-calling)
- [SomaticSniper](#somaticsniper)
- [Strelka2](#strelka2)
- [Mutect2](#mutect2)
- [MuSE](#muse)
- [Variant Intersection](#variant-intersection)
- [BCFtools, VennDiagram and vcf2maf](#variant-intersection)

---

## Variant Calling
### SomaticSniper
![Diagram](somatic-sniper.svg)
#### Tools
##### SomaticSniper
SomaticSniper source: https://github.com/genome/somatic-sniper
Version: SomaticSniper v1.0.5.0 (Released on Jul 16, 2015)
GitHub Package: ghcr.io/uclahs-cds/somaticsniper:1.0.5.0
##### bam-readcount
bam-readcount source: https://github.com/genome/bam-readcount
Version: v0.8.0 Release (Released on Oct 21, 2016)
GitHub Package: ghcr.io/uclahs-cds/bam-readcount:0.8.0

### Strelka2
![Diagram](strelka2.svg)
#### Tools
##### Manta
Manta source: https://github.com/Illumina/manta
Version: v1.6.0 (Released on Jul 9, 2019)
GitHub Package: ghcr.io/uclahs-cds/manta:1.6.0
##### Strelka2
Strelka2 source: https://github.com/Illumina/strelka
Version: v2.9.10 (Released on Nov 7, 2018)
GitHub Package: ghcr.io/uclahs-cds/strelka2:2.9.10

### Mutect2
![alt text](mutect2_chart.svg)
#### Tools
##### GATK
GATK source: https://github.com/broadinstitute/gatk
Version: 4.2.4.1 (Released on Jan 4, 2022)
Docker Image: broadinstitute/gatk:4.2.4.1

### MuSE
![alt text](muse_chart.svg?raw=true)
#### Tools
##### MuSE
MuSE source: https://github.com/wwylab/MuSE
Version: 2.0 (Released on Aug 25, 2021)
GitHub Package: https://github.com/uclahs-cds/docker-MuSE/pkgs/container/muse

## Variant Intersection
![alt text](intersect_chart.svg?raw=true)
#### Tools
##### BCFtools
BCFtools source: https://samtools.github.io/bcftools
Version: 1.17 (Released on Feb 21, 2023)
GitHub Package: https://github.com/uclahs-cds/bcftools:1.17
##### VennDiagram
VennDiagram source: https://github.com/uclahs-cds/public-R-VennDiagram
Version: 1.7.3 (Released on Apr 12, 2022)
##### vcf2maf
vcf2maf source: ghcr.io/mskcc/vcf2maf/vcf2maf
Version: v1.6.18
GitHub Package: https://github.com/mskcc/vcf2maf
123 changes: 123 additions & 0 deletions docs/intersect_chart.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 34 additions & 0 deletions docs/intersect_flowchart.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
@startuml intersect_chart

!include pipeline_elements.iuml!string_functions
!include pipeline_elements.iuml!input_rect
!include pipeline_elements.iuml!output_rect
!include pipeline_elements.iuml!intermediate_rect
!include pipeline_elements.iuml!qc_rect
!include pipeline_elements.iuml!test_rect
!include pipeline_elements.iuml!process_legend

skinparam linetype ortho

$test_process(num_algos, '', 'Results from two or more algorithms?')
$input_process(input_vcfs, 'filtered VCFs', 'Input: VCFs for all selected algorithms')
$output_process(done, 'Exit', 'Intersect is only run if two or %newline()more algorithms were selected')

$intermediate_process(intersect_vcfs_2, "BCFtools isec -n +2", "Output: SNVs found by 2 or more algorithms: %newline()* 1 VCF per algorithm %newline() * list of SNVS (README.txt, sites.txt)")
$intermediate_process(intersect_vcfs_1, "BCFtools isec -n +1", "Output: All SNVs, including private: %newline() * list of SNVS (README.txt, sites.txt)")

$output_process(plot_venn, 'VennDiagram', 'Output: plot in TIFF format showing %newline()intersection counts for all SNVs')
$output_process(concat_vcfs, 'BCFtools concat', 'Output: Single VCF with all SNVs %newline()found by 2 or more algorithms')
$output_process(vcf2maf, 'vcf2maf', 'Output: Single MAF with all SNVs %newline()found by 2 or more algorithms: %newline()//SNV-concat.maf.gz//')

num_algos -d-> input_vcfs: true
num_algos -d-> done: false
input_vcfs -d-> intersect_vcfs_2
input_vcfs -d-> intersect_vcfs_1
intersect_vcfs_1 -d-> plot_venn
intersect_vcfs_2 -d-> concat_vcfs: // SNV-concensus-variants.vcf.gz //
concat_vcfs -d-> vcf2maf: // SNV-concat.vcf.gz //

$add_legend("right")

@enduml
21 changes: 11 additions & 10 deletions docs/muse_chart.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/muse_flowchart.puml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ $input_process(tumor_bam, 'Tumor BAM', 'Input: tumor BAM file')
$intermediate_process(muse_call_run, "MuSE call", "Output: position-specific %newline()summary statistics")
$input_process(dbsnp_file, 'dbSNP', 'Input: dbSNP VCF')
$intermediate_process(muse_sump_run, "MuSE sump", "Output: VCF with tier-based %newline()quality cutoffs")
$output_process(filter_vcf, 'Filter VCF', 'Output: VCF with top tier %newline()somatic variants')
$output_process(filter_vcf, 'Filter VCF', 'Output: VCF with top tier %newline()somatic variants: %newline()//SNV.vcf.gz//')

tumor_bam -d-> muse_call_run
normal_bam -d-> muse_call_run
muse_call_run -d-> muse_sump_run
muse_call_run -d-> muse_sump_run: // MuSE.txt //
dbsnp_file -d-> muse_sump_run
muse_sump_run -d-> filter_vcf
muse_sump_run -d-> filter_vcf: // raw.vcf //

$add_legend("left")

Expand Down
28 changes: 14 additions & 14 deletions docs/mutect2_chart.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/mutect2_flowchart.puml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ $intermediate_process(gatk_mergeVcfs, 'GATK MergeVcfs', 'Output: unfiltered VCF'
$intermediate_process(mutect2_mergeStats, 'GATK MergeMutectStats', 'Output: Mutect2 stats')
$intermediate_process(gatk_learnReadOrientation, 'GATK LearnReadOrientationModel', 'Output: orientation bias artifacts table')
$intermediate_process(mutect2_filterCalls, 'GATK FilterMutectCalls', 'Output: VCF with non-PASSing variants tagged %newline()Optional output: filtering stats')
$output_process(bcftools_filterVCF, 'remove non-PASS variants', 'Output: VCF including only variants that passed above filters')
$output_process(bcftools_splitVCF, 'split by variant type', 'Output: %newline()pass-snvs.vcf.gz %newline()pass-mnvs.vcf.gz %newline()pass-indels.vcf.gz')
$intermediate_process(bcftools_filterVCF, 'remove non-PASS variants', 'Output: VCF including only variants that passed above filters')
$output_process(bcftools_splitVCF, 'split by variant type', 'Output: %newline()//SNV.vcf.gz // %newline()//MNV.vcf.gz // %newline()//Indel.vcf.gz //')
$qc_process(gatk_filteringStats, 'Optional', 'Output: filteringStats.tsv')

reference_genome -d-> gatk_splitIntervals
Expand All @@ -40,14 +40,14 @@ mutect2_call_sSNV -d-> mutect2_mergeStats
mutect2_call_sSNV -d-> gatk_learnReadOrientation
mutect2_call_sSNV -d-> mutect2_mergeStats

gatk_mergeVcfs -d-> mutect2_filterCalls : // unfiltered.vcf.gz //
gatk_mergeVcfs -d-> mutect2_filterCalls : //unfiltered.vcf.gz//
gatk_learnReadOrientation -d-> mutect2_filterCalls : // f1r2.tar.gz //
mutect2_mergeStats -d-> mutect2_filterCalls : // unfiltered.vcf.stats.gz //
reference_genome -d-> mutect2_filterCalls
contamination_table -d-> mutect2_filterCalls
mutect2_filterCalls -d-> gatk_filteringStats
mutect2_filterCalls -d-> bcftools_filterVCF : // filtered.vcf.gz //
bcftools_filterVCF -d-> bcftools_splitVCF : // pass.vcf.gz //
bcftools_filterVCF -d-> bcftools_splitVCF : // all-pass.vcf.gz //

$add_legend('left')

Expand Down
7 changes: 7 additions & 0 deletions docs/pipeline_elements.iuml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,13 @@ $description"
!endprocedure
@enduml

@startuml(id=test_rect)
!unquoted procedure $test_process($alias, $title $description)
rectangle $alias #Orange as "
$description"
!endprocedure
@enduml

@startuml(id=process_legend)
!unquoted procedure $add_legend($pos="bottom right")
legend $pos
Expand Down
File renamed without changes
File renamed without changes
2 changes: 1 addition & 1 deletion metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Maintainers: ['[email protected]']
Contributors: ['Mao Tian', 'Bugh Caden', 'Helena Winata', 'Yash Patel', 'Sorel Fitz-Gibbon']
Languages: ['Docker', 'Nextflow']
Dependencies: ['Docker', 'Nextflow']
Tools: ['GATK 4.4.0.0', 'SomaticSniper v1.0.5.0', 'SAMtools v1.16.1', 'Strelka2 v2.9.10', 'Manta v1.6.0', 'MuSE v2.0.2', BCFtools v1.17]
Tools: ['GATK 4.4.0.0', 'SomaticSniper v1.0.5.0', 'SAMtools v1.16.1', 'Strelka2 v2.9.10', 'Manta v1.6.0', 'MuSE v2.0.2', 'BCFtools v1.17', 'R v4.3.1', 'VennDiagram v1.7.3', 'vcf2maf v1.6.18']
6 changes: 3 additions & 3 deletions module/intersect-processes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ process intersect_VCFs_BCFtools {
path intersect_regions_index

output:
path "*.vcf.gz", emit: consensus_vcf
path "*.vcf.gz.tbi", emit: consensus_idx
path "*.vcf.gz", emit: intersect_vcf
path "*.vcf.gz.tbi", emit: intersect_idx
path ".command.*"
path "isec-2-or-more/*.txt"
path "isec-1-or-more/*.txt", emit: isec
Expand All @@ -51,7 +51,7 @@ process intersect_VCFs_BCFtools {
${regions_command} \
${vcf_list}
awk '/Using the following file names:/{x=1;next} x' isec-2-or-more/README.txt \
| sed 's/.vcf.gz\$/-consensus-variants.vcf.gz/' \
| sed 's/.vcf.gz\$/-intersect.vcf.gz/' \
| while read a b c d; do
mv \$a \$d
mv \$a.tbi \$d.tbi
Expand Down
10 changes: 5 additions & 5 deletions module/intersect.nf
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ workflow intersect {
script_dir_ch,
intersect_VCFs_BCFtools.out.isec,
)
consensus_vcfs_ch = intersect_VCFs_BCFtools.out.consensus_vcf
intersect_vcfs_ch = intersect_VCFs_BCFtools.out.intersect_vcf
.map { sortVcfs(it) }
concat_VCFs_BCFtools(
consensus_vcfs_ch,
intersect_VCFs_BCFtools.out.consensus_idx
intersect_vcfs_ch,
intersect_VCFs_BCFtools.out.intersect_idx
)
convert_VCF_vcf2maf(
concat_VCFs_BCFtools.out.concat_vcf,
Expand All @@ -52,10 +52,10 @@ workflow intersect {
.map{ it -> ['SNV', it]}
)
compress_MAF_vcf2maf(convert_VCF_vcf2maf.out.concat_maf)
file_for_sha512 = intersect_VCFs_BCFtools.out.consensus_vcf
file_for_sha512 = intersect_VCFs_BCFtools.out.intersect_vcf
.flatten()
.map{ it -> ["${file(it).getName().split('_')[0]}-SNV-vcf", it]}
.mix(intersect_VCFs_BCFtools.out.consensus_idx
.mix(intersect_VCFs_BCFtools.out.intersect_idx
.flatten()
.map{ it -> ["${file(it).getName().split('_')[0]}-SNV-idx", it]}
)
Expand Down
2 changes: 1 addition & 1 deletion module/strelka2-processes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ process call_sSNV_Strelka2 {

output:
tuple val("SNV"), path("StrelkaSomaticWorkflow/results/variants/somatic.snvs.vcf.gz"), emit: snvs_vcf
tuple val("INDEL"), path("StrelkaSomaticWorkflow/results/variants/somatic.indels.vcf.gz"), emit: indels_vcf
tuple val("Indel"), path("StrelkaSomaticWorkflow/results/variants/somatic.indels.vcf.gz"), emit: indels_vcf
path "StrelkaSomaticWorkflow"
path ".command.*"

Expand Down
2 changes: 1 addition & 1 deletion test/config/a_mini-all-tools.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ params {
dataset_id = 'TWGSAMIN'
// setting params.exome to TRUE will add the '--exome' option when running manta and strelka2 and the -E option when running MuSE
exome = false
save_intermediate_files = false
save_intermediate_files = true

// module options
bgzip_extra_args = ''
Expand Down
13 changes: 13 additions & 0 deletions test/config/a_mini-muse.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,21 @@ params {
bgzip_extra_args = ''
tabix_extra_args = ''

// mutect2 options
split_intervals_extra_args = ''
mutect2_extra_args = ''
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 12
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
vcf2maf_extra_args = ''

}

methods.setup()
9 changes: 8 additions & 1 deletion test/config/a_mini-mutect2.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ params {
mutect2_extra_args = ''
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 4
scatter_count = 12
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
vcf2maf_extra_args = ''
}

methods.setup()
15 changes: 15 additions & 0 deletions test/config/a_mini-somaticsniper.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,21 @@ params {
// module options
bgzip_extra_args = ''
tabix_extra_args = ''

// mutect2 options
split_intervals_extra_args = ''
mutect2_extra_args = ''
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 12
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
vcf2maf_extra_args = ''
}

methods.setup()
15 changes: 15 additions & 0 deletions test/config/a_mini-strelka2.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,21 @@ params {
// module options
bgzip_extra_args = ''
tabix_extra_args = ''

// mutect2 options
split_intervals_extra_args = ''
mutect2_extra_args = ''
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 12
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
vcf2maf_extra_args = ''
}

methods.setup()

0 comments on commit ae5698f

Please sign in to comment.