Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of the MetaPhlAn4 #333

Merged
merged 10 commits into from
Jul 21, 2023
11 changes: 6 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#315](https://github.com/nf-core/taxprofiler/pull/315) Updated to nf-core pipeline template v2.9 (added by @sofstam & @jfy133)
- [#319](https://github.com/nf-core/taxprofiler/pull/319) Added support for virus hit expansion in Kaiju (❤️ to @dnlrxn for requesting, added by @jfy133)
- [#323](https://github.com/nf-core/taxprofiler/pull/323) Add ability to skip sequencing quality control tools (❤️ to @vinisalazar for requesting, added by @jfy133)
- [#318](https://github.com/nf-core/taxprofiler/pull/318) Added the profiler MetaPhlAn4 and removed MetaPhlAn3 (added by @LilyAnderssonLee)
- [#318](https://github.com/nf-core/taxprofiler/pull/318) Add support for MetaPhlAn4 and change the profiler flag into `--run_metaphlan` (added by @LilyAnderssonLee)

### `Fixed`

Expand All @@ -35,10 +35,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Dependencies`

| Tool | Previous version | New version |
| -------- | ---------------- | ----------- |
| MultiQC | 1.13 | 1.14 |
| taxpasta | 0.2.3 | 0.4.1 |
| Tool | Previous version | New version |
| --------- | ---------------- | ----------- |
| MultiQC | 1.13 | 1.14 |
| taxpasta | 0.2.3 | 0.4.1 |
| MetaPhlAn | 3.0.12 | 4.0.6 |

### `Deprecated`

Expand Down
1 change: 0 additions & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ run_modules:
- samtools
- kraken
- kaiju
- metaphlan
- diamond
- malt
- motus
Expand Down
3 changes: 1 addition & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ You will only receive the `.sam` and `.megan` files if you supply `--malt_save_r

</details>

The main taxonomic profiling file from MetaPhlAn is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn however does not include raw counts by default.
The output contains a file named `*_combined_reports.txt`, which provides an overview of the classification results for all samples. The main taxonomic profiling file from MetaPhlAn is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn however does not include raw counts by default. Additionally, it contains intermediate Bowtie2 output `.bowtie2out.txt`, which presents a condensed representation of the mapping results of your sequencing reads to MetaPhlAn's marker gene sequences. The alignments are listed in tab-separated columns, including Read ID and Marker Gene ID, with each alignment represented on a separate line.

### mOTUs

Expand Down Expand Up @@ -574,7 +574,6 @@ You can expect in the MultiQC reports either sections and/or general stats colum
- bracken
- centrifuge
- kaiju
- metaphlan
- diamond
- malt
- motus
Expand Down
8 changes: 4 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ Krona can only be run on MALT output if path to Krona taxonomy database supplied

##### MetaPhlAn

MetaPhlAn4 is compatible with the MetaPhlAn3 database by adding the `--mpa3` paramter to the MetaPhlAn process in the config file `module.config`.
MetaPhlAn4 is compatible with the MetaPhlAn3 database by adding the `--mpa3` into `db_params` of the `database.csv`.

##### mOTUs

Expand Down Expand Up @@ -716,22 +716,22 @@ See the [MALT manual](https://software-ab.informatik.uni-tuebingen.de/download/m

MetaPhlAn does not allow (easy) construction of custom databases. Therefore we recommend to use the prebuilt database of marker genes that is provided by the developers.

To do this you need to have `MetaPhlAn` installed on your machine.
To perform this task, ensure that you have installed `MetaPhlAn` on your machine. Keep in mind that each version of MetaPhlAn aligns with a specific version of the database. Therefore, if you download the MetaPhlAn3 database, remember to include `--mpa3` as a parameter for the database.
LilyAnderssonLee marked this conversation as resolved.
Show resolved Hide resolved

```bash
metaphlan --install --bowtie2db <YOUR_DB_NAME>/
```

You can then add the `<YOUR_DB_NAME>/` path to your nf-core/taxprofiler database input sheet.

> 🛈 It is generally not recommended to modify this database yourself, thus this is currently not supported in the pipeline. However, it is possible to customise the existing database by adding your own marker genomes following the instructions [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.1#customizing-the-database).
> 🛈 It is generally not recommended to modify this database yourself, thus this is currently not supported in the pipeline. However, it is possible to customise the existing database by adding your own marker genomes following the instructions [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-4#customizing-the-database).

> 🖊️ If using your own database is relevant for you, please contact the nf-core/taxprofiler developers on the [nf-core slack](https://nf-co.re/join) and we will investigate supporting this.

<details markdown="1">
<summary>Expected files in database directory</summary>

- `metaphlan`
- `metaphlan4`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.pkl`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.fna.bz2`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.1.bt2l`
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/db_check.nf
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def validate_db_rows(LinkedHashMap row) {
if ( !row.keySet().containsAll(expected_headers) ) error("[nf-core/taxprofiler] ERROR: Invalid database input sheet - malformed column names. Please check input TSV. Column names should be: ${expected_headers.join(", ")}")

// valid tools specified
def expected_tools = [ "bracken", "centrifuge", "diamond", "kaiju", "kraken2", "krakenuniq", "malt", "metaphlan3", "metaphlan", "motus", "ganon", "kmcp" ]
def expected_tools = [ "bracken", "centrifuge", "diamond", "kaiju", "kraken2", "krakenuniq", "malt", "metaphlan", "motus", "ganon", "kmcp" ]
if ( !expected_tools.contains(row.tool) ) error("[nf-core/taxprofiler] ERROR: Invalid tool name. Please see documentation for all supported profilers. Error in: ${row}")

// detect quotes in params
Expand Down