Skip to content

Commit

Permalink
Merge pull request #1014 from jbv2/dsl2-qualimap2-3
Browse files Browse the repository at this point in the history
Adding new qualimap
  • Loading branch information
jfy133 authored Sep 8, 2023
2 parents 4bf74d5 + b99321c commit 6d44207
Show file tree
Hide file tree
Showing 10 changed files with 274 additions and 9 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
- "-profile test,docker --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam"
- "-profile test,docker --mapping_tool bowtie2"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
steps:
- name: Check out pipeline code
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,10 @@

> Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. doi: [10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114)
- [QualiMap](https://doi.org/10.1093/bioinformatics/btv566)

> QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. Download: http://qualimap.bioinfo.cipf.es/
- [DamageProfiler](https://doi.org/10.1093/bioinformatics/btab190)
> DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). doi: [10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: https://github.com/Integrative-Transcriptomics/DamageProfiler
Expand Down
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,15 @@ process {
]
}

withName: "QUALIMAP_BAMQC" {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
publishDir = [
path: { "${params.outdir}/mapstats/qualimap/${meta.reference}/${meta.sample_id}/}" },
mode: params.publish_dir_mode,
enabled: true
]
}

//
// DAMAGE CALCULATION
//
Expand Down
27 changes: 27 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,33 @@ These curves will be displayed in the pipeline run's MultiQC report, however you

### Mapping Statistics

#### QualiMap

<details markdown="1">
<summary>Output files</summary>

- `qualimap/`

- `<sample_id>/`
- `*.html`: in-depth report including percent coverage, depth coverage, GC content, etc. of mapped reads
- `genome_results.txt`
- `css/`: HTML CSS styling used for the report
- `images_qualimapReport/`: PNG version of images from the HTML report.
- `raw_data_qualimapReport/`: The raw data used to render the HTML report as TXT files.

</details>

[QualiMap](http://qualimap.bioinfo.cipf.es/)
is a tool which provides statistics on the quality of the mapping of your reads to your reference genome. It allows you to assess how well covered your reference genome is by your data, both in 'fold' depth (average number of times a given base on the reference is covered by a read) and 'percentage' (the percentage of all bases on the reference genome that is covered at a given fold depth). These outputs allow you to make decision if you have enough quality data for downstream applications like genotyping, and how to adjust the parameters for those tools accordingly.

> NB: Neither fold coverage nor percent coverage on its own is sufficient to assess whether you have a high quality mapping. Abnormally high fold coverages of a smaller region such as highly conserved genes or un-removed-adapter-containing reference genomes can artificially inflate the mean coverage, yet a high percent coverage is not useful if all bases of the genome are covered at just 1x coverage.
**Note that many of the statistics from this module are displayed in the General Stats table, as they represent single values that are not plottable.**

You will receive output for each sample. This means you will statistics of deduplicated values of all types of libraries combined in a single value (i.e. non-UDG treated, full-UDG, paired-end, single-end all together).

> ⚠️ Warning: If your library has no reads mapping to the reference, this will result in an empty BAM file. Qualimap will therefore not produce any output even if a BAM exists!
#### Bedtools

<details markdown="1">
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"qualimap/bamqc": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"samtools/faidx": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
123 changes: 123 additions & 0 deletions modules/nf-core/qualimap/bamqc/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 47 additions & 0 deletions modules/nf-core/qualimap/bamqc/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,10 @@ params {
skip_deduplication = false
deduplication_tool = 'markduplicates'

// Qualimap
skip_qualimap = false
snpcapture_bed = null

// Contamination estimation
run_contamination_estimation_angsd = false
contamination_estimation_angsd_chrom_name = 'X'
Expand Down
8 changes: 8 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,14 @@
"description": "Turns on defects mode to extrapolate without testing for defects (lc_extrap mode only).",
"help_text": "Activates defects mode of `lc_extrap`, which does the extrapolation without testing for defects.\n\n> Modifies preseq lc_extrap parameter: `-D`",
"fa_icon": "fab fa-creative-commons-sampling-plus"
},
"skip_qualimap": {
"type": "boolean",
"default": "false"
},
"snpcapture_bed": {
"type": "string",
"description": "Path to snp capture in BED format. Provided file can also be gzipped."
}
},
"fa_icon": "fas fa-search"
Expand Down
54 changes: 46 additions & 8 deletions workflows/eager.nf
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,10 @@ include { MTNUCRATIO } from '../modules/n
include { HOST_REMOVAL } from '../modules/local/host_removal'
include { ENDORSPY } from '../modules/nf-core/endorspy/main'
include { SAMTOOLS_FLAGSTAT as SAMTOOLS_FLAGSTATS_BAM_INPUT } from '../modules/nf-core/samtools/flagstat/main'
include { BEDTOOLS_COVERAGE as BEDTOOLS_COVERAGE_DEPTH ; BEDTOOLS_COVERAGE as BEDTOOLS_COVERAGE_BREADTH } from '../modules/nf-core/bedtools/coverage/main'
include { SAMTOOLS_VIEW_GENOME } from '../modules/local/samtools_view_genome.nf'
include { BEDTOOLS_COVERAGE as BEDTOOLS_COVERAGE_DEPTH ; BEDTOOLS_COVERAGE as BEDTOOLS_COVERAGE_BREADTH } from '../modules/nf-core/bedtools/coverage/main'
include { SAMTOOLS_VIEW_GENOME } from '../modules/local/samtools_view_genome.nf'
include { QUALIMAP_BAMQC } from '../modules/nf-core/qualimap/bamqc/main'
include { GUNZIP as GUNZIP_SNPBED } from '../modules/nf-core/gunzip/main.nf'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -136,6 +138,26 @@ workflow EAGER {
if ( params.preprocessing_tool == 'fastp' && !adapterlist.extension.matches(".*(fa|fasta|fna|fas)") ) error "[nf-core/eager] ERROR: fastp adapter list requires a `.fasta` format and extension (or fa, fas, fna). Check input: --preprocessing_adapterlist ${params.preprocessing_adapterlist}"
}

// QualiMap
if ( params.snpcapture_bed ) {
ch_snpcapture_bed_gunzip = Channel.fromPath( params.snpcapture_bed, checkIfExists: true )
.collect()
.map {
file ->
meta = file.simpleName
[meta,file]
}
.branch {
meta, bed ->
forgunzip: bed[0].extension == "gz"
skip: true
}
ch_snpcapture_bed = GUNZIP_SNPBED(ch_snpcapture_bed_gunzip.forgunzip).gunzip.mix(ch_snpcapture_bed_gunzip.skip).map{it[1]}

} else {
ch_snpcapture_bed = []
}

// Contamination estimation
hapmap_file = file(params.contamination_estimation_angsd_hapmap, checkIfExists:true)

Expand All @@ -147,11 +169,10 @@ workflow EAGER {
file(params.input)
)
ch_versions = ch_versions.mix( INPUT_CHECK.out.versions )

// TODO: OPTIONAL, you can use nf-validation plugin to create an input channel from the samplesheet with Channel.fromSamplesheet("input")
// See the documentation https://nextflow-io.github.io/nf-validation/samplesheets/fromSamplesheet/
// ! There is currently no tooling to help you write a sample sheet schema

//
// SUBWORKFLOW: Indexing of reference files
//
Expand Down Expand Up @@ -270,6 +291,20 @@ workflow EAGER {
ch_dedupped_flagstat = Channel.empty()
}

//
// MODULE QUALIMAP
//

if ( !params.skip_qualimap ) {
ch_qualimap_input = ch_dedupped_bams
.map {
meta, bam, bai ->
[ meta, bam ]
}
QUALIMAP_BAMQC(ch_qualimap_input, ch_snpcapture_bed)
ch_versions = ch_versions.mix( QUALIMAP_BAMQC.out.versions )
}

//
// MODULE: remove reads mapping to the host from the raw fastq
//
Expand Down Expand Up @@ -395,9 +430,9 @@ workflow EAGER {


//
// MODULE: Bedtools coverage
// MODULE: Bedtools coverage
//

if ( params.run_bedtools_coverage ) {

ch_anno_for_bedtools = Channel.fromPath(params.mapstats_bedtools_featurefile, checkIfExists: true).collect()
Expand All @@ -412,15 +447,14 @@ workflow EAGER {
SAMTOOLS_VIEW_GENOME(ch_dedupped_bams)

ch_genome_for_bedtools = SAMTOOLS_VIEW_GENOME.out.genome

BEDTOOLS_COVERAGE_BREADTH(ch_dedupped_for_bedtools, ch_genome_for_bedtools)
BEDTOOLS_COVERAGE_DEPTH(ch_dedupped_for_bedtools, ch_genome_for_bedtools)

ch_versions = ch_versions.mix( SAMTOOLS_VIEW_GENOME.out.versions )
ch_versions = ch_versions.mix( BEDTOOLS_COVERAGE_BREADTH.out.versions )
ch_versions = ch_versions.mix( BEDTOOLS_COVERAGE_DEPTH.out.versions )
}


//
// SUBWORKFLOW: Calculate Damage
Expand Down Expand Up @@ -491,6 +525,10 @@ workflow EAGER {
ch_multiqc_files = ch_multiqc_files.mix(CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect())
//ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([])) // Replaced with custom mixing

if ( !params.skip_qualimap ) {
ch_multiqc_files = ch_multiqc_files.mix( QUALIMAP_BAMQC.out.results.collect{it[1]}.ifEmpty([]) )
}

MULTIQC (
ch_multiqc_files.collect(),
ch_multiqc_config.toList(),
Expand Down

0 comments on commit 6d44207

Please sign in to comment.