Review suggestions

nf-core · Sep 26, 2024 · 80cdb0c · 80cdb0c
1 parent d00104a
commit 80cdb0c
Showing 1 changed file with 20 additions and 4 deletions.
diff --git a/docs/usage.md b/docs/usage.md
@@ -118,20 +118,36 @@ nf-core/taxprofiler does not provide any databases by default, nor does it curre
 
 The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet.
 
-The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short and long read data.
+The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short- and long-read data.
 
 :::warning
 To allow user freedom, nf-core/taxprofiler does not check for mandatory or the validity of non-file database parameters for correct execution of the tool - excluding options offered via pipeline level parameters! Please validate your database parameters (cross-referencing [parameters](https://nf-co.re/taxprofiler/parameters), and the given tool documentation) before submitting the database sheet! For example, if you don't use the default read length - Bracken will require `-r <read_length>` in the `db_params` column.
 :::
 
-An example database sheet can look as follows, where 7 tools are being used, and `malt` and `kraken2` will be used against two databases each.
+An example database sheet can look as follows, where 7 tools are being used, and `malt` and `kraken2` will be used against two databases each. Since the `db_type` column is missing, it is therefore assumed that the database and parameters are suitable for both short- and long-read data.
+
+In the second example database sheet, the `db_type` column has been provided. The valid options are `short`, `long` and `short;long`.
 
 `kraken2` will be run twice even though only having a single 'dedicated' database because specifying `bracken` implies first running `kraken2` on the `bracken` database, as required by `bracken`.
 
+```csv
+tool,db_name,db_params,db_path
+malt,malt85,-id 85,/<path>/<to>/malt/testdb-malt/
+malt,malt95,-id 90,/<path>/<to>/malt/testdb-malt.tar.gz
+bracken,db1,;-r 150,/<path>/<to>/bracken/testdb-bracken.tar.gz
+kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
+krakenuniq,db3,,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz
+centrifuge,db1,,/<path>/<to>/centrifuge/minigut_cf.tar.gz
+metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/
+motus,db_mOTU,,/<path>/<to>/motus/motus_database/
+ganon,db1,,/<path>/<to>/ganon/test-db-ganon.tar.gz
+kmcp,db1,;-I 20,/<path>/<to>/kmcp/test-db-kmcp.tar.gz
+```
+
 ```csv
 tool,db_name,db_params,db_type,db_path
 malt,malt85,-id 85,short,/<path>/<to>/malt/testdb-malt/
-malt,malt95,-id 90,,/<path>/<to>/malt/testdb-malt.tar.gz
+malt,malt95,-id 90,short,/<path>/<to>/malt/testdb-malt.tar.gz
 bracken,db1,;-r 150,short,/<path>/<to>/bracken/testdb-bracken.tar.gz
 kraken2,db2,--quick,short,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
 krakenuniq,db3,,short;long,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz
@@ -159,7 +175,7 @@ Column specifications are as follows:
 | `tool`      | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database.                                                                                                                                                                                                                                                                                                                                                                                                              |
 | `db_name`   | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database.                                                                                                                                                                                                                                                                                                                                                                                                                    |
 | `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. |
-| `db_type`   | An optional column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Possible values: `long`, `short`, or `short;long`                                                                                                                                                                                                                                                                                                                                                              |
+| `db_type`   | An optional column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Possible values: `long`, `short`, or `short;long`. If the `db_type` column is missing from the database.csv, it will take the default value short;long                                                                                                                                                                                                        |
 | `db_path`   | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required].                                                                                                                                                                                                                                                                                                                                                                       |
 
 :::tip