Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
80f1dc8
add MetaBinner
d4straub Oct 16, 2025
213e20a
[automated] Fix code linting
nf-core-bot Oct 16, 2025
e5cbdc7
merge dev
d4straub Oct 17, 2025
6665f6e
add script to bin
d4straub Oct 17, 2025
4f5605b
adjust create_metabinner_bins.py usage
d4straub Oct 22, 2025
8230584
update metabinner output
d4straub Oct 22, 2025
baab236
refine metabinner output
d4straub Oct 22, 2025
948c3a1
adjust metabinner bin filenames
d4straub Oct 22, 2025
9e4a2cd
adjust metabinner bin file names again
d4straub Oct 22, 2025
6dc9a85
update file publishing, testing, and docs
d4straub Oct 22, 2025
471c9b9
--fix rocrate_readme_sync
d4straub Oct 22, 2025
6f98ee2
ignore metabinner error 1 and 255 and add acknowledgment
d4straub Oct 23, 2025
33eddd5
merge from dev
d4straub Oct 24, 2025
a78e110
fix rocrate_readme_sync
d4straub Oct 24, 2025
89e30e9
Update docs/output.md
d4straub Oct 24, 2025
9b0032c
add param bin_metabinner_scale
d4straub Oct 24, 2025
4332440
handle tool version appropriately
d4straub Oct 24, 2025
5d68eb1
update CHANGELOG with acknowledgment
d4straub Oct 24, 2025
41f32b7
merge in dev
d4straub Oct 24, 2025
34ac949
fix rocrate_readme_sync
d4straub Oct 24, 2025
ae187a0
split METABINNER into subworkflow
d4straub Oct 27, 2025
fbbf820
merge upstream dev
d4straub Oct 28, 2025
04c9e50
fix rocrate_readme_sync
d4straub Oct 28, 2025
981c8f8
Apply suggestion from @dialvarezs
d4straub Nov 3, 2025
3e689bc
Apply suggestion from @dialvarezs
d4straub Nov 3, 2025
17fd583
apply suggestions from code review
d4straub Nov 3, 2025
ebb2f12
merge upstream dev
d4straub Nov 3, 2025
a4943f9
fix rocrate_readme_sync
d4straub Nov 3, 2025
587cfce
[automated] Fix code linting
nf-core-bot Nov 3, 2025
dfb2e07
clean up unzipped file
d4straub Nov 4, 2025
33f87a5
merge upstream dev
d4straub Nov 4, 2025
0ab6474
fix rocrate_readme_sync
d4straub Nov 4, 2025
8f432dd
min_contig_size now via process input val instead of task.ext.
d4straub Nov 4, 2025
96da27f
Merge remote-tracking branch 'upstream/dev' into add-MetaBinner
dialvarezs Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#881](https://github.com/nf-core/mag/pull/881) - Add binner MetaBinner (by @d4straub, insprired by @HeshamAlmessady & @AlphaSquad)

### `Changed`

### `Fixed`
Expand All @@ -18,9 +20,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Dependencies`

| Tool | Previous version | New version |
| ---- | ---------------- | ----------- |
| | | |
| Tool | Previous version | New version |
| ---------- | ---------------- | ----------- |
| MetaBinner | | 1.4.4-0 |

### `Deprecated`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@

> Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., Lahti, L., Loman, N. J., Andersson, A. F., & Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11), 1144–1146. doi: 10.1038/nmeth.3103

- [MetaBinner](https://doi.org/10.1186/s13059-022-02832-6)

> Wang Z, Huang P, You R, Sun F, Zhu S. MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6. PMID: 36609515; PMCID: PMC9817263.

- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1)

> Sieber, C. M. K., et al. 2018. "Recovery of Genomes from Metagenomes via a Dereplication, Aggregation and Scoring Strategy." Nature Microbiology 3 (7): 836-43. doi: 10.1038/s41564-018-0171-1
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The pipeline then:
- performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast)
- (optionally) performs ancient DNA assembly validation using [PyDamage](https://github.com/maxibor/pydamage) and contig consensus sequence recalling with [Freebayes](https://github.com/freebayes/freebayes) and [BCFtools](http://samtools.github.io/bcftools/bcftools.html)
- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal), and bins with [Prokka](https://github.com/tseemann/prokka) and optionally [MetaEuk](https://www.google.com/search?channel=fs&client=ubuntu-sn&q=MetaEuk)
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), [MaxBin2](https://sourceforge.net/projects/maxbin2/), [CONCOCT](https://github.com/BinPro/CONCOCT), and/or [COMEBin](https://github.com/ziyewang/COMEBin)
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), [MaxBin2](https://sourceforge.net/projects/maxbin2/), [CONCOCT](https://github.com/BinPro/CONCOCT), [COMEBin](https://github.com/ziyewang/COMEBin), and/or [MetaBinner](https://github.com/ziyewang/MetaBinner)
- checks the quality of the genome bins using [Busco](https://busco.ezlab.org/), [CheckM](https://ecogenomics.github.io/CheckM/), or [CheckM2](https://github.com/chklovski/CheckM2) and optionally [GUNC](https://grp-bork.embl-community.io/gunc/)
- Performs ancient DNA validation and repair with [pyDamage](https://github.com/maxibor/pydamage) and [freebayes](https://github.com/freebayes/freebayes)
- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool)
Expand Down
45 changes: 45 additions & 0 deletions bin/create_metabinner_bins.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env python

## Originally written by Hesham Almessady (@HeshamAlmessady) and Adrian Fritz (@AlphaSquad) in https://github.com/hzi-bifo/mag and released under the MIT license.
## See git repository (https://github.com/nf-core/mag) for full license text.

import sys
import os
from Bio import SeqIO

def main():
# Argument parsing
if len(sys.argv) != 6:
print("Usage: python create_metabinner_bins.py <binning_file> <fasta_file> <output_path> <prefix> <length_threshold>")
sys.exit(1)

binning = sys.argv[1]
fasta = sys.argv[2]
path = sys.argv[3]
prefix = sys.argv[4]
length = int(sys.argv[5])

# Create output directory if it doesn't exist
os.makedirs(path, exist_ok=True)

# Load binning data into a dictionary
Metabinner_bins = {}
with open(binning, 'r') as b:
for line in b:
contig, bin = line.strip().split('\t')
Metabinner_bins[contig] = bin

# Process the input fasta file
with open(fasta) as handle:
for record in SeqIO.parse(handle, "fasta"):
if len(record) < length:
f = prefix + ".tooShort.fa"
elif record.id not in Metabinner_bins:
f = prefix + ".unbinned.fa"
else:
f = prefix + "." + Metabinner_bins[record.id] + ".fa"
with open(os.path.join(path, f), 'a') as out:
SeqIO.write(record, out, "fasta")

if __name__ == "__main__":
main()
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ process {
withName: COMEBIN_RUNCOMEBIN {
errorStrategy = { task.exitStatus in [1, 255] ? 'ignore' : 'retry' }
}
withName: METABINNER_METABINNER {
errorStrategy = { task.exitStatus in [1, 255] ? 'ignore' : 'retry' }
}
withName: DASTOOL_DASTOOL {
errorStrategy = { task.exitStatus in ((130..145) + 104 + 175) ? 'retry' : task.exitStatus == 1 ? 'ignore' : 'finish' }
}
Expand Down
46 changes: 46 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,52 @@ process {
ext.prefix = { "${meta.assembler}-COMEBin-${meta.id}" }
}

withName: METABINNER_KMER {
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
}

withName: METABINNER_TOOSHORT {
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
}

withName: METABINNER_METABINNER {
publishDir = [
[
path: { "${params.outdir}/GenomeBinning/MetaBinner/stats"},
mode: params.publish_dir_mode,
pattern: '*.{log,log.gz,tsv.gz}'
]
]
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
ext.args = { "-s ${params.bin_metabinner_scale}" }
}

withName: METABINNER_BINS {
publishDir = [
[
path: { "${params.outdir}/GenomeBinning/MetaBinner/"},
mode: params.publish_dir_mode,
pattern: 'bins/*.fa.gz'
],
[
path: { "${params.outdir}/GenomeBinning/MetaBinner/discarded" },
mode: params.publish_dir_mode,
pattern: '*tooShort.fa.gz'
],
[
path: { "${params.outdir}/GenomeBinning/MetaBinner/discarded" },
mode: params.publish_dir_mode,
pattern: '*lowDepth.fa.gz'
],
[
path: { "${params.outdir}/GenomeBinning/MetaBinner/unbinned" },
mode: params.publish_dir_mode,
pattern: '*unbinned.fa.gz'
]
]
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
}

withName: SEQKIT_STATS {
ext.args = ""
publishDir = [enabled: false]
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ params {
// Including (even length filtered) CONOCT bins adds another 5 minutes, so we skip it in the default test (testing in assemblyinput)
skip_concoct = true
skip_comebin = true
skip_metabinner = true
busco_db = params.pipelines_testdata_base_path + 'mag/databases/busco/bacteria_odb10.2024-01-08.tar.gz'
busco_db_lineage = 'bacteria_odb10'
busco_clean = true
Expand Down
1 change: 1 addition & 0 deletions conf/test_alternatives.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ params {
skip_maxbin2 = true
skip_concoct = true
skip_comebin = true
skip_metabinner = true
skip_metaeuk = true
megahit_fix_cpu_1 = true
}
1 change: 1 addition & 0 deletions conf/test_assembly_input.config
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ params {
gtdbtk_skip_aniscreen = true
skip_concoct = false
skip_comebin = true
skip_metabinner = true

refine_bins_dastool = true
refine_bins_dastool_threshold = 0.0
Expand Down
1 change: 1 addition & 0 deletions conf/test_hybrid.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ params {
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true
skip_metabinner = true

spadeshybrid_fix_cpus = 2
}
1 change: 1 addition & 0 deletions conf/test_longreadonly.config
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,5 @@ params {
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true
skip_metabinner = true
}
1 change: 1 addition & 0 deletions conf/test_longreadonly_alternatives.config
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,6 @@ params {
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true
skip_metabinner = true
skip_metaeuk = true
}
1 change: 1 addition & 0 deletions conf/test_minimal.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ params {
skip_maxbin2 = true
skip_concoct = true
skip_comebin = true
skip_metabinner = true
skip_prokka = true
skip_binqc = true
run_busco = false
Expand Down
1 change: 1 addition & 0 deletions conf/test_single_end.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ params {
bcftools_view_medium_variant_quality = 0
bcftools_view_minimal_allelesupport = 3
skip_comebin = true
skip_metabinner = true
min_length_unbinned_contigs = 1000000
max_unbinned_contigs = 2
run_busco = false
Expand Down
24 changes: 22 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,8 @@ Files in these two folders contain all contigs of an assembly.
- `stats/[assembler]-[binner]-[sample/group]_*.tsv`: Coverage statistics of each sub-contig cut up by CONCOCT prior in an intermediate step prior to binning. Likely not useful in most cases.
- `stats/[assembler]-[binner]-[sample/group].log.txt`: CONCOCT execution log file.
- `stats/[assembler]-[binner]-[sample/group]_*.args`: List of arguments used in CONCOCT execution.
- </details>

</details>

All the files and contigs in these folders will be assessed by QUAST and BUSCO, if the parameter `--postbinning_input` is not set to `refined_bins_only`.

Expand All @@ -456,12 +457,31 @@ Note that CONCOCT does not output what it considers 'unbinned' contigs, therefor
- `stats/[assembler]-[binner]-[sample/group]/comebin_res.tsv`: TSV mapping the output clusters to contigs.
- `stats/[assembler]-[binner]-[sample/group]/covembeddings.tsv`: TSV describing the embeddings of the contigs.
- `stats/[assembler]-[binner]-[sample/group]/embeddings.tsv`: TSV describing the embeddings of the contigs.
- </details>

</details>

All the files and contigs in these folders will be assessed by QUAST and BUSCO, if the parameter `--postbinning_input` is not set to `refined_bins_only`.

Note that COMEBin does not output what it considers 'unbinned' contigs, therefore no 'discarded' contigs are produced here. You may still need to do your own manual curation of the resulting bins.

### MetaBinner

[MetaBinner](https://github.com/ziyewang/MetaBinner) is described as a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities.

<details markdown="1">
<summary>Output files</summary>

- `GenomeBinning/MetaBinner/`
- `bins/[assembler]-[binner]-[sample/group].*.fa.gz`: Genome bins retrieved from input assembly.
- `discarded/[assembler]-[binner]-[sample/group].tooShort.fa.gz`: Contigs that were not considered for binning because of length.
- `unbinned/[assembler]-[binner]-[sample/group].unbinned.fa.gz`: Contigs that were not binned despite having suitable length.
- `stats/[assembler]-[binner]-[sample/group].metabinner.log.gz`: Log file.
- `stats/[assembler]-[binner]-[sample/group].tsv.gz`: TSV mapping the contigs to output clusters.

</details>

All the files and contigs in these folders will be assessed by QUAST and binning QC tools, if the parameter `--postbinning_input` is not set to `refined_bins_only`.

### DAS Tool

[DAS Tool](https://github.com/cmks/DAS_Tool) is an automated binning refinement method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly. nf-core/mag uses this tool to attempt to further improve bins based on combining the MetaBAT2 and MaxBin2 binning output, assuming sufficient quality is met for those bins.
Expand Down
7 changes: 7 additions & 0 deletions modules/local/metabinner_bins/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::metabinner=1.4.4-0
44 changes: 44 additions & 0 deletions modules/local/metabinner_bins/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
process METABINNER_BINS {
tag "$meta.id"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/metabinner:1.4.4--hdfd78af_0' :
'quay.io/biocontainers/metabinner:1.4.4--hdfd78af_0' }"

input:
tuple val(meta), path(fasta), path(membership)
val val_min_contig_size

output:
tuple val(meta), path("*.tooShort.fa.gz") , emit: tooshort
tuple val(meta), path("*.unbinned.fa.gz") , emit: unbinned
tuple val(meta), path("bins/*.fa.gz") , emit: bins
path "versions.yml" , emit: versions

script:
def prefix = task.ext.prefix ?: "${meta.id}"
def min_contig_size = val_min_contig_size ?: "1000"
"""
# unzip membership file
zcat ${membership} > membership.tsv

# collect bins & un-binned fractions
create_metabinner_bins.py \\
membership.tsv \\
${fasta} \\
./bins \\
${prefix} \\
${min_contig_size}
find ./bins/ -name "*.fa" -type f | xargs -t -n 1 bgzip -@ ${task.cpus}

# zip contig fractions
find ./bins/ -name "*[tooShort,unbinned].fa.gz" -type f -exec mv {} . \\;

cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version 2>&1 | sed 's/Python //g')
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/local/metabinner_kmer/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::metabinner=1.4.4-0
36 changes: 36 additions & 0 deletions modules/local/metabinner_kmer/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
process METABINNER_KMER {
tag "$meta.id"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/metabinner:1.4.4--hdfd78af_0' :
'quay.io/biocontainers/metabinner:1.4.4--hdfd78af_0' }"

input:
tuple val(meta), path(fasta)
val val_min_contig_size

output:
tuple val(meta), path("*_kmer_4_f${min_contig_size}.csv.gz"), emit: composition_profile
path "versions.yml" , emit: versions

script:
def prefix = task.ext.prefix ?: "${meta.id}"
min_contig_size = val_min_contig_size ?: "1000"
def VERSION = '1.4.4-0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
"""
metabinner_path=\$(dirname \$(which run_metabinner.sh))

# create composition profile (contigs > ${min_contig_size} p (default 1000), k = 4)
python \${metabinner_path}/scripts/gen_kmer.py ${fasta} ${min_contig_size} 4

gzip -cn ${fasta.baseName}_kmer_4_f${min_contig_size}.csv > ${prefix}_kmer_4_f${min_contig_size}.csv.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
MetaBinner: $VERSION
python: \$(python --version 2>&1 | sed 's/Python //g')
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/local/metabinner_metabinner/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::metabinner=1.4.4-0
Loading