diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md
index bcf25f83be..b9644cae45 100644
--- a/src/content/docs/contributing/nf_core_basic_training.md
+++ b/src/content/docs/contributing/nf_core_basic_training.md
@@ -821,7 +821,7 @@ versions (file) │File containing software versions │versions.yml
:::tip{title="Exercise 4 - Identification of available nf-core modules"}
-1. Check which versions are available for the nf-core module `salmon/quant`.
+1. Get information abou the nf-core module `salmon/quant`.
solution 1
@@ -839,6 +839,8 @@ versions (file) │File containing software versions │versions.yml
nf-core modules list local
```
+ If `salmon/quant` is not listed, there is no local version installed.
+
solution 1
- ```
- ```
+ ```bash
+ nf-core modules install adapterremoval
+ ```
solution 2
- ```
- ```
+ ```
+ Installation added the module directory `/workspace/basic_training/nf-core-demotest/modules/nf-core/adapterremoval`:
+ .
+ ├── environment.yml
+ ├── main.nf
+ ├── meta.yml
+ └── tests
+ ├── main.nf.test
+ ├── main.nf.test.snap
+ └── tags.yml
+
+ The `test` directory contains all information required to perform basic tests for the module, it rarely needs to be changed. `main.nf` is the main workflow file that contains the module code. All input and output variables of the module are described in the `meta.yml` file, whereas the `environment.yml` file contains the dependancies of the module.
+ ```
solution 3
- ```
- ```
+ ```bash title="workflows/demotest.nf"
+ [...]
+ /*
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+ include { FASTQC } from '../modules/nf-core/fastqc/main'
+ include { MULTIQC } from '../modules/nf-core/multiqc/main'
+ include { paramsSummaryMap } from 'plugin/nf-validation'
+ include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+ include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+ include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_demotest_pipeline'
+ include { ADAPTERREMOVAL } from '../modules/nf-core/adapterremoval/main'
+
+ [...]
+
+ ```
solution 4
- ```
- ```
+ ```bash title="workflows/demotest.nf"
+ [...]
+ FASTQC (
+ ch_samplesheet
+ )
+ ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
+ ch_versions = ch_versions.mix(FASTQC.out.versions.first())
+
+ //
+ // MODULE: ADAPTERREMOVAL
+ //
+ ADAPTERREMOVAL(
+
+ )
+ [...]
+ ```
solution 5
- ```
- ```
+ `adapterremoval` requires three input channels: `meta`, `reads` and `adapterlist`, as outlined in the `meta.yml` of the module. `meta` and `reads` are typically given in one channel as a metamap, whereas the `adapterlist` will be it's own channel for which we should give a path. See here:
+
+ ```bash title="adapterremoval/main.nf"
+ [...]
+ input:
+ tuple val(meta), path(reads)
+ path(adapterlist)
+ [...]
+ ```
+
+ The meta map containing the metadata and the reads can be taken directly from the samplesheet as is the case for FastQC, therefore we can give it the input channel `ch_samplesheet`. The `adapterlist` could either be a fixed path, or a parameter that is given on the command line. For now, we will just add a dummy channel called `adapterlist` assuming that it will be a parameter given in the command line. With this, the new module call for adapterremoval looks as follows:
+
+ ```bash title="workflows/demotest.nf"
+ [...]
+ //
+ // MODULE: ADAPTERREMOVAL
+ //
+ ADAPTERREMOVAL(
+ ch_samplesheet
+ params.adapterlist
+ )
+ [...]
+ ```
solution 6
+ solution 7
+ In order to use `params.adapterlist` we need to add the parameter to the `nextflow.config`.
- ```
- ```
+ ```bash title="nextflow.config"
+ /// Global default params, used in configs
+ params {
- solution 7
- ```
- ```
+ ```bash
+ nf-core lint
+ ```
solution 8
- ```
- ```
+ To run the pipeline, be aware that we now need to specify a file containing the adapters. As such, we create a new file called "adapterlist.txt" and add the adapter sequence "[WE NEED AN ADAPTER SEQUENCE HERE]" to it. Then we can run the pipeline as follows:
+
+ ```bash
+ nextflow run nf-core-demotest/ -profile test,docker --outdir test_results --adapterlist /path/to/adapterlist.txt
+
+ ```
solution 9
- ```
- ```
+ ```bash
+ git add .
+ git commit -m "add adapterremoval module"
+ ```
solution 1
-
- ```
-
- ```
-
-
-TL;TR
+TL;DR
1. Write your script on any language (python, bash, R, ruby). E.g. `maf2bed.py` @@ -1121,6 +1207,7 @@ with description and type of license._ Let's create a simple custom script that converts a MAF file to a BED file called `maf2bed.py` and place it in the bin directory of our nf-core-testpipeline:: ``` + #!/usr/bin/env python """bash title="maf2bed.py" Author: Raquel Manzano - @RaqManzano @@ -1130,37 +1217,34 @@ License: MIT import argparse import pandas as pd - def argparser(): - parser = argparse.ArgumentParser(description="") - parser.add_argument("-maf", "--mafin", help="MAF input file", required=True) - parser.add_argument("-bed", "--bedout", help="BED input file", required=True) - parser.add_argument( - "--extra", help="Extra columns to keep (space separated list)", nargs="+", required=False, default=[] - ) - return parser.parse_args() +parser = argparse.ArgumentParser(description="") +parser.add_argument("-maf", "--mafin", help="MAF input file", required=True) +parser.add_argument("-bed", "--bedout", help="BED input file", required=True) +parser.add_argument( +"--extra", help="Extra columns to keep (space separated list)", nargs="+", required=False, default=[] +) +return parser.parse_args() def maf2bed(maf_file, bed_file, extra): - maf = pd.read_csv(maf_file, sep="\t", comment="#") - bed = maf[["Chromosome", "Start_Position", "End_Position"] + extra] - bed.to_csv(bed_file, sep="\t", index=False, header=False) - +maf = pd.read_csv(maf_file, sep="\t", comment="#") +bed = maf[["Chromosome", "Start_Position", "End_Position"] + extra] +bed.to_csv(bed_file, sep="\t", index=False, header=False) def main(): - args = argparser() - maf2bed(maf_file=args.mafin, bed_file=args.bedout, extra=args.extra) +args = argparser() +maf2bed(maf_file=args.mafin, bed_file=args.bedout, extra=args.extra) - -if __name__ == "__main__": - main() +if **name** == "**main**": +main() ``` ### 2. Make sure your script is in the right folder -Now, let's move it to the correct directory: +Now, let's move it to the correct directory and make sure it is executable: -``` +```bash mv maf2bed.py /path/where/pipeline/is/bin/. chmod +x /path/where/pipeline/is/bin/maf2bed.py ``` @@ -1173,6 +1257,10 @@ are appropriate (this is optional) and directives (via conda and/or container) for the definition of dependencies. ++ ++ Since `maf2bed.py` is in the `bin` directory we can directory call it in the script block of our new module `CONVERT_MAF2BED`. You only have to be careful with how you call variables (some explanations on when to use `${variable}` vs. `$variable`): A process may contain any of the following definition blocks: directives, inputs, outputs, when clause, and the process script. Here is how we write it: ``` process CONVERT_MAF2BED { - // HEADER - tag "$meta.id" +// HEADER +tag "$meta.id" label 'process_single' // DEPENDENCIES DIRECTIVES conda "anaconda::pandas=1.4.3" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/pandas:1.4.3' : - 'quay.io/biocontainers/pandas:1.4.3' }" - // INPUT BLOCK - input: - tuple val(meta), path(maf) - // OUTPUT BLOCK - output: - tuple val(meta), path('*.bed') , emit: bed - path "versions.yml" , emit: versions - // WHEN CLAUSE - when: - task.ext.when == null || task.ext.when - // SCRIPT BLOCK - script: // This script is bundled with the pipeline in bin - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - - """ +'https://depot.galaxyproject.org/singularity/pandas:1.4.3' : +'quay.io/biocontainers/pandas:1.4.3' }" + +// INPUT BLOCK +input: +tuple val(meta), path(maf) + +// OUTPUT BLOCK +output: +tuple val(meta), path('\*.bed') , emit: bed +path "versions.yml" , emit: versions + +// WHEN CLAUSE +when: +task.ext.when == null || task.ext.when + +// SCRIPT BLOCK +script: // This script is bundled with the pipeline in bin +def args = task.ext.args ?: '' +def prefix = task.ext.prefix ?: "${meta.id}" + +""" maf2bed.py --mafin $maf --bedout ${prefix}.bed - """ +""" +} ``` More on nextflow's process components in the [docs](https://www.nextflow.io/docs/latest/process.html). @@ -1305,32 +1409,53 @@ More on nextflow's process components in the [docs](https://www.nextflow.io/docs In general, we will call out nextflow module `main.nf` and save it in the `modules` folder under another folder called `conver_maf2bed`. If you believe your custom script could be useful for others and it is potentially reusable or calling a tool that is not yet present in nf-core modules you can start the process of making it official adding a `meta.yml` [explained above](#adding-modules-to-a-pipeline). In the `meta.yml` The overall tree for the pipeline skeleton will look as follows: ``` + pipeline/ ├── bin/ -│ └── maf2bed.py +│ └── maf2bed.py ├── modules/ -│ ├── local/ -│ │ └── convert_maf2bed/ -│ │ ├── main.nf -│ │ └── meta.yml -│ └── nf-core/ +│ ├── local/ +│ │ └── convert_maf2bed/ +│ │ ├── main.nf +│ │ └── meta.yml +│ └── nf-core/ ├── config/ -│ ├── base.config -│ └── modules.config +│ ├── base.config +│ └── modules.config ... + ``` To use our custom module located in `./modules/local/convert_maf2bed` within our workflow, we use a module inclusions command as follows (this has to be done before we invoke our workflow): -``` +```bash title="workflows/demotest.nf" include { CONVERT_MAF2BED } from './modules/local/convert_maf2bed/main' workflow { - input_data = [[id:123, data_type='maf'], /path/to/maf/example.maf] - CONVERT_MAF2BED(input_data) +input_data = [[id:123, data_type='maf'], /path/to/maf/example.maf] +CONVERT_MAF2BED(input_data) } ``` -### Other notes +:::tip{title="Exercise 6 - Adding a custom module"} +In the directory `exercise_6` you will find the custom script `print_hello.py`, which will be used for this and the next exercise. + +1. Create a local module that runs the `print_hello.py` script +2. Add the module to your main workflow +3. Run the pipeline +4. Lint the pipeline +5. Commit your changes +Some additional infos that might be of interest
+@@ -1211,91 +1301,105 @@ The `conda` directive allows for the definition of the process dependencies using the [Conda package manager](https://docs.conda.io/en/latest/). Nextflow automatically sets up an environment for the given package names listed by in the conda directive. For example: ``` + process foo { - conda 'bwa=0.7.15' +conda 'bwa=0.7.15' - ''' - your_command --here - ''' +''' +your_command --here +''' } + ``` Multiple packages can be specified separating them with a blank space e.g. `bwa=0.7.15 samtools=1.15.1`. The name of the channel from where a specific package needs to be downloaded can be specified using the usual Conda notation i.e. prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15` ``` + process foo { - conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' +conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' - ''' - your_bwa_cmd --here - your_samtools_cmd --here - ''' +''' +your_bwa_cmd --here +your_samtools_cmd --here +''' } + ``` Similarly, we can apply the `container` directive to execute the process script in a [Docker](http://docker.io/) or [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) container. When running Docker, it requires the Docker daemon to be running in machine where the pipeline is executed, i.e. the local machine when using the local executor or the cluster nodes when the pipeline is deployed through a grid executor. ``` -process foo { - conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' - container 'dockerbox:tag' +process foo { +conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' +container 'dockerbox:tag' - ''' - your_bwa_cmd --here - your_samtools_cmd --here - ''' +''' +your_bwa_cmd --here +your_samtools_cmd --here +''' } + ``` Additionally, the `container` directive allows for a more sophisticated choice of container and if it Docker or Singularity depending on the users choice of container engine. This practice is quite common on official nf-core modules. ``` + process foo { - conda "bioconda::fastqc=0.11.9" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : - 'biocontainers/fastqc:0.11.9--0' }" - - ''' - your_fastqc_command --here - ''' +conda "bioconda::fastqc=0.11.9" +container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? +'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : +'biocontainers/fastqc:0.11.9--0' }" + +''' +your_fastqc_command --here +''' } + ``` +More info on labels
A `label` will @@ -1181,11 +1269,13 @@ choice that can be used for configuring. E.g. we use the `label` 'process_single', this looks as follows: ``` + withLabel:process_single { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 1.GB * task.attempt, 'memory') } - time = { check_max( 1.h * task.attempt, 'time' ) } - } +cpus = { check_max( 1 _ task.attempt, 'cpus' ) } +memory = { check_max( 1.GB _ task.attempt, 'memory') } +time = { check_max( 1.h \* task.attempt, 'time' ) } +} + ```++ +::: + +### Further reading and additional notes #### What happens in I want to use containers but there is no image created with the packages I need? @@ -1349,7 +1474,9 @@ See more info about modules in the nextflow docs [here](https://nf-co.re/docs/co As well as the pipeline template you can lint individual or all modules with a single command: ``` + nf-core modules lint --all + ``` ## Nextflow Schema @@ -1357,8 +1484,10 @@ nf-core modules lint --all All nf-core pipelines can be run with --help to see usage instructions. We can try this with the demo pipeline that we just created: ``` + cd ../ nextflow run nf-core-demo/ --help + ``` ### Working with Nextflow schema @@ -1370,19 +1499,23 @@ Thankfully, we provide a user-friendly tool for editing this file: nf-core schem To see this in action, let’s add some new parameters to nextflow.config: ``` + params { demo = 'param-value-default' foo = null bar = false baz = 12 // rest of the config file.. + ``` Then run nf-core schema build: ``` + cd nf-core-demo/ nf-core schema build + ``` The CLI tool should then prompt you to add each new parameter. @@ -1420,3 +1553,7 @@ Here in the schema editor you can edit: - `nf-core schema build` opens an interface to allow you to describe your pipeline parameters and set default values, and which values are valid. ::: + +``` + +```solution 1
+ + ``` + + ``` + +