Please see our installation guide to learn how to set up this pipeline first.
A basic execution of the pipeline looks as follows:
a) Without a site-specific config file
nextflow run marchoeppner/gabi -profile singularity --input samples.csv \\
--reference_base /path/to/references \\
--run_name pipeline-test
where path_to_references
corresponds to the location in which you have installed the pipeline references (this can be omitted to trigger an on-the-fly temporary installation, but is not recommended in production).
In this example, the pipeline will assume it runs on a single computer with the singularity container engine available. Available options to provision software are:
-profile singularity
-profile docker
-profile podman
-profile conda
Additional software provisioning tools as described here may also work, but have not been tested by us. Please note that conda may not work for all packages on all platforms. If this turns out to be the case for you, please consider switching to one of the supported container engines.
b) with a site-specific config file
nextflow run marchoeppner/gabi -profile lsh --input samples.csv \\
--run_name pipeline-test
In this example, both --reference_base
and the choice of software provisioning are already set in the local configuration lsh
and don't have to be provided as command line argument.
If you are running this pipeline in a production setting, you will want to lock the pipeline to a specific version. This is natively supported through nextflow with the -r
argument:
nextflow run marchoeppner/pipeline -profile lsh -r 1.0 <other options here>
The -r
option specifies a github release tag or branch, so could also point to main
for the very latest code release. Please note that every major release of this pipeline (1.0, 2.0 etc) comes with a new reference data set, which has the be installed separately.
The following options can be set to control resource usage outside of a site-specific config file.
The maximum number of cpus a single job can request. This is typically the maximum number of cores available on a compute node or your local (development) machine.
The maximum amount of memory a single job can request. This is typically the maximum amount og RAM available on a compute node or your local (development) machine. Typically it is advisable to set this a little lower than the maximum amount of RAM to prevent the machine from swapping.
The maximum allowed run/wall time a single job can request. This is mostly relevant for environments where run time is restricted, such as in a computing cluster with active resource manager or possibly some cloud environments.