Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
232 commits
Select commit Hold shift + click to select a range
60450b6
some comments about possible solutions
harper357 Jul 30, 2025
985fd2d
added params.binqc_tool_extras
harper357 Aug 4, 2025
a6d6427
change to run if tools in params.binqc_tool_extras
harper357 Aug 4, 2025
89503cb
updated schema
harper357 Aug 4, 2025
fc9f07c
split params.binqc_tool_extras into a list
harper357 Aug 4, 2025
fc49911
concat outputs from binqc_tool_extras tools
harper357 Aug 5, 2025
9724454
add binqc_tool_extras check for db downloads
harper357 Aug 5, 2025
4ad5b41
remove redudant comments
harper357 Aug 5, 2025
c000974
change concat calls to .collectFile calls
harper357 Aug 5, 2025
3aeca36
Added short info for new summary tables
harper357 Aug 5, 2025
4d1c5f1
Revert "change concat calls to .collectFile calls"
harper357 Aug 6, 2025
f8a7969
added config entry to publish concatted summaries
harper357 Aug 7, 2025
8cd912a
add binqc_tool_extras check for db downloads
harper357 Aug 5, 2025
00e5f2b
remove redudant comments
harper357 Aug 5, 2025
a855932
Fix bt2l versions of bowtie2 index files not being picked up
jfy133 Aug 11, 2025
e9c1a9a
Clarify slightly that single/paried reads correspond only to short reads
jfy133 Aug 11, 2025
f0794a3
Correct PR
jfy133 Aug 11, 2025
af43aaa
Change mechanism of importing 'assets' derived reference genomes to f…
jfy133 Aug 11, 2025
1c4ef24
Update changelog
jfy133 Aug 11, 2025
958e6ed
Update bbmap/bbnorm
erikrikarddaniel Aug 12, 2025
b00ec0b
Reduce bbnorms memory to 0.8
erikrikarddaniel Aug 12, 2025
b142363
Add forgotten new module files for bbnorm
erikrikarddaniel Aug 13, 2025
529e6d6
Generate patch for bbmap/bbnorm
erikrikarddaniel Aug 13, 2025
78be40b
Update CHANGELOG.md
erikrikarddaniel Aug 13, 2025
dffc784
Prettier
erikrikarddaniel Aug 13, 2025
e40a4be
docs: add `group` parameter heading
vinisalazar Aug 12, 2025
c055741
Update docs/usage.md
jfy133 Aug 13, 2025
b4d8295
docs: editing `group` description
vinisalazar Aug 14, 2025
22eda4f
Update docs/usage.md
jfy133 Aug 15, 2025
c68b83d
Update CHANGELOG.md and README.md
vinisalazar Aug 15, 2025
f898fff
Fix RO crate
jfy133 Aug 15, 2025
888c6fa
Remove incorrectly used `.first()`
jfy133 Aug 15, 2025
1666485
Update CHANGELOG.md
jfy133 Aug 15, 2025
21ce24b
Update CHANGELOG.md
jfy133 Aug 15, 2025
ea1fecb
Test alternatives (WIP)
dialvarezs Aug 13, 2025
52f53f2
Test alternatives (WIP 2)
dialvarezs Aug 14, 2025
c882440
Test alternatives (complete)
dialvarezs Aug 16, 2025
bbbebf1
Exclude file from test
dialvarezs Aug 16, 2025
1fc5dd4
[automated] Fix code linting
nf-core-bot Aug 16, 2025
4865c13
Fix count of successful tasks
dialvarezs Aug 16, 2025
55bfda5
Indentation
dialvarezs Aug 19, 2025
1c1aa9a
Add log checks
dialvarezs Aug 19, 2025
982ee8c
Fix task counter
dialvarezs Aug 19, 2025
d78c6d5
Improve checks
dialvarezs Aug 19, 2025
40f68e4
Add fasta tests
dialvarezs Aug 21, 2025
0a12e4a
Fix tasks number
dialvarezs Aug 21, 2025
5b959eb
Improve truthy checks
dialvarezs Aug 21, 2025
42b0470
Replace local pool_ modules by cat/fastq
dialvarezs Aug 16, 2025
dc1e74f
Improve local modules structure
dialvarezs Aug 16, 2025
25fcaa8
Remove local nanolyse module
dialvarezs Aug 16, 2025
e18e0b8
Improve structure of local subworkflows
dialvarezs Aug 16, 2025
bfff38e
Formatting
dialvarezs Aug 16, 2025
f91f6cb
Update changelog
dialvarezs Aug 16, 2025
ee65a2a
Address review comments
dialvarezs Aug 19, 2025
a830852
Update changelog
dialvarezs Aug 19, 2025
52efb5c
Update modules/local/samtools/unmapped/main.nf
dialvarezs Aug 19, 2025
9596572
Flatten local modules and subworkflows
dialvarezs Aug 20, 2025
d59b55d
Add local subwf meta files
dialvarezs Aug 21, 2025
0bb74c2
Address comments
dialvarezs Aug 21, 2025
ce32ac1
Update modules, round 1
dialvarezs Aug 22, 2025
f9265e9
Update modules, round 2
dialvarezs Aug 22, 2025
ea36190
Update modules, round 3
dialvarezs Aug 22, 2025
8200ccc
Update modules, round 4
dialvarezs Aug 22, 2025
749561d
Update modules, round 5
dialvarezs Aug 22, 2025
c80f10b
Update modules, round 6
dialvarezs Aug 22, 2025
10f9b86
Update modules, round 7
dialvarezs Aug 22, 2025
cd44388
Update nf-core subworkflows
dialvarezs Aug 22, 2025
238645b
Bump BUSCO version in snapshot
dialvarezs Aug 22, 2025
423d2cf
Update metabat2
dialvarezs Aug 22, 2025
ea49e63
Fix BUSCO version in changelog
dialvarezs Aug 22, 2025
ddac1be
Fix metabat2 snaps
dialvarezs Aug 22, 2025
2fce6a9
Add new nf-test configs
dialvarezs Aug 25, 2025
0c671c9
Old config cleanup
dialvarezs Aug 25, 2025
b6934ea
Update changelog
dialvarezs Aug 25, 2025
afac435
Use large disk to prevent storage issues
dialvarezs Sep 5, 2025
ff76d4f
Add missing versions mixing and move version calls after each module …
jfy133 Aug 22, 2025
abd671f
Final version fixes and moving
jfy133 Aug 22, 2025
743e6ae
Update CHANGELOG
jfy133 Aug 22, 2025
d292e39
Update subworkflows/local/assembly_longread/main.nf
jfy133 Aug 22, 2025
32a746c
Replace sceond long ternary with proper if-else
jfy133 Aug 22, 2025
ccceab3
Remove if/else statement and revert to ternary because cannot otherwi…
jfy133 Aug 22, 2025
c476a2e
Remove `.first()` from everywhere as not necessary and can mess with …
jfy133 Aug 22, 2025
140335d
Revert "Remove `.first()` from everywhere as not necessary and can me…
jfy133 Aug 22, 2025
6298686
Correctly remove `.first()` from everywhere as not necessary and can …
jfy133 Aug 22, 2025
7e07848
Use simplified corutils version catch from @mahesh-panchal
jfy133 Aug 22, 2025
0928873
Fix wrong version placement
jfy133 Aug 22, 2025
c8f41e5
Use correct flags for flye and update some snapshots
jfy133 Aug 25, 2025
2ef408e
Typo fix for metaMDG option
jfy133 Aug 25, 2025
8aa3112
Fix autoformatting
jfy133 Aug 25, 2025
7c2aa9d
Use correc tyaml
jfy133 Aug 25, 2025
f8875f0
Update snapshots
dialvarezs Aug 26, 2025
e6c346c
Add multiqc snapshots
dialvarezs Aug 26, 2025
895962c
Update hybrid
dialvarezs Aug 26, 2025
b953b06
Update snapshots
dialvarezs Aug 26, 2025
6362a31
Update snapshot
dialvarezs Aug 26, 2025
f8d6122
Standardise header sand tags
jfy133 Aug 28, 2025
5bd4245
Use porechop_porechop in test_hbyrid profile
jfy133 Aug 28, 2025
1794ac8
update snapshot to include correct tool
jfy133 Aug 28, 2025
0ec5a2d
Add assembly input nf-test (WIP)
dialvarezs Sep 3, 2025
47377d5
Add metaeuk
dialvarezs Sep 3, 2025
70a1b30
Update fastp
dialvarezs Aug 26, 2025
ceff9d8
Update bcftools
dialvarezs Aug 26, 2025
da41a21
Update adapterremoval
dialvarezs Aug 26, 2025
66c0788
Update dastool
dialvarezs Aug 26, 2025
23ad784
Update freebayes
dialvarezs Aug 26, 2025
1b15f5f
Update gtdbtk
dialvarezs Aug 26, 2025
528ea94
Update porechop
dialvarezs Aug 26, 2025
f70c1e3
Improve bcftools usage
dialvarezs Aug 26, 2025
099bd49
Update genomad
dialvarezs Aug 26, 2025
7b7d7a2
Update snapshots
dialvarezs Aug 26, 2025
a7dcec1
Update changelog
dialvarezs Aug 26, 2025
3ed6756
Update porechop/abi to patched version to not result in duplicated reads
jfy133 Aug 25, 2025
1b3cf72
Update CHANGELOG.md
jfy133 Sep 3, 2025
d6a4183
Update nf-schema to stop all params being reported as erroring out
jfy133 Sep 5, 2025
9eee06e
Fix URL typo
jfy133 Sep 5, 2025
ae83d47
Fix typo in github handlge
jfy133 Sep 5, 2025
0ecc217
Fix typo in github handlge
jfy133 Sep 5, 2025
3e48bdc
Add HiRSE code promo badge
jfy133 Sep 5, 2025
d3d4801
Fix RO create
jfy133 Sep 5, 2025
cdcb6c9
Enable GUNC
dialvarezs Sep 5, 2025
177a304
Update assembly_input test
dialvarezs Sep 12, 2025
8175693
Update snapshots (nextflow version)
dialvarezs Sep 12, 2025
4328a79
Update metaeuk config
dialvarezs Sep 12, 2025
1e042f6
Revert to fasta input for metaeuk
dialvarezs Sep 13, 2025
c2b7793
Update changelog
dialvarezs Sep 13, 2025
a39933e
Fix nextflow version
dialvarezs Sep 13, 2025
f45ada3
Fix version pattern capture
jfy133 Sep 15, 2025
1f27cf9
Enable metauk using swissprot + refined bins only
dialvarezs Sep 16, 2025
302bdce
Update CHANGELOG.md
dialvarezs Sep 16, 2025
67d12a8
Update snapshots
dialvarezs Sep 16, 2025
08f440e
Bump version using nf-core tools
jfy133 Sep 17, 2025
2b7d4ca
Update metromap, deprecate old diagram, use metromap in README
jfy133 Sep 17, 2025
e5739e6
nicer HiRSE badge
jfy133 Sep 17, 2025
c170112
Fix ro crate REAMDE change
jfy133 Sep 17, 2025
9f9e169
[automated] Fix code linting
nf-core-bot Sep 17, 2025
f414caa
Remove short-read profiling from metormap
jfy133 Sep 17, 2025
88dcdfe
Update all snapshots to include latest pipeline version, standarding …
jfy133 Sep 17, 2025
c3e09a4
Update snapshots
dialvarezs Sep 17, 2025
64124a0
Update snapshot
dialvarezs Sep 18, 2025
4181f14
Increase shards
dialvarezs Sep 18, 2025
2dc74dd
Update GTDBK to version that works with conda
jfy133 Sep 19, 2025
9b4f24f
update CHANGELOG
jfy133 Sep 19, 2025
99175d0
Fix GTDBTK version in snapshot
jfy133 Sep 19, 2025
1d1aa60
Update docs/usage.md
jfy133 Sep 19, 2025
7d53d7a
Address comments from @erikrikarddaniel
jfy133 Sep 19, 2025
a951cea
Update docs/usage.md
jfy133 Sep 19, 2025
10e90d9
Update docs/usage.md
jfy133 Sep 19, 2025
2650a55
Update chagnelog date and metromap modifications after feedback from …
jfy133 Sep 19, 2025
4372095
update snapshot
jfy133 Sep 19, 2025
b313b6b
Improve CONDA pinning in several modules to match snapshot and ensure…
jfy133 Sep 21, 2025
8d0d0fc
bump NANOPLOT to 1.46.1 to fix kaleodio issue in conda
jfy133 Sep 21, 2025
f7b1133
Bump NANOPLOT versions in snaphost
jfy133 Sep 21, 2025
b56a595
Use correct repository
jfy133 Sep 21, 2025
14a87bb
Update CONCOCT to use latest conda build and fix version reporting
jfy133 Sep 21, 2025
48cd6d4
Update all snapshots fror hardcoded coreuttlis
jfy133 Sep 22, 2025
a41a22b
Syncronise local rename modules to match container of UNTAR module (t…
jfy133 Sep 22, 2025
4fb559e
Fix GTDBTK DB prep
jfy133 Sep 22, 2025
45807f7
And for single end
jfy133 Sep 22, 2025
32de7db
Use correct version for update coreutrils capture
jfy133 Sep 22, 2025
46f1882
Remove quotes to make snapshot
jfy133 Sep 22, 2025
258edb8
Remove quotes everywehere for the coreutils
jfy133 Sep 22, 2025
0b4055b
Standardise remaining core utils conda/containers and update snapshots
jfy133 Sep 23, 2025
67e285b
Update hybrid snapshot too
jfy133 Sep 23, 2025
840b03e
Use same container not just conda env for coreutils (to match with of…
jfy133 Sep 23, 2025
0278a1b
Remove variable MultiQC YAML files in longread only test
jfy133 Sep 23, 2025
e33a205
Use correct versions in snapshots for all TAR processes
jfy133 Sep 23, 2025
9d3b6d5
Deprecate GDTBTK's --gtdb_mash parameter as no longer suppoted by the…
jfy133 Sep 23, 2025
c9ff54e
Revert "Deprecate GDTBTK's --gtdb_mash parameter as no longer suppote…
jfy133 Sep 23, 2025
9f691c9
Re-deprecate mash_db bit retain -skip_ani_screen in GTDBTk process
jfy133 Sep 23, 2025
ab9d966
Typo fixes as noticed by @dialvarezs
jfy133 Sep 23, 2025
47f2709
Implement option to skip FastANI screen in GTDB-Tk (to replace mash_d…
jfy133 Sep 23, 2025
7e43d61
Improve parameter name for clarity and add changelog entry
jfy133 Sep 24, 2025
033c2c4
Update nextflow_schema.json
jfy133 Sep 23, 2025
2ada69d
[automated] Fix code linting
nf-core-bot Sep 23, 2025
a451aa5
Deactivate scratch on METASPADES to allow functioning with fusion
jfy133 Sep 24, 2025
e1a77e3
Use the correct Seqera (unofficial) approve from @FriederikeHanssen@e…
jfy133 Sep 24, 2025
3be1a4b
Add missing --threads parameter for metaeuk easypredict
jfy133 Sep 25, 2025
7560fa7
Make sure all custom exit code contiions are consistent, and add igno…
jfy133 Sep 29, 2025
6f0a1bc
Use latest GTDB download link
jfy133 Sep 29, 2025
1424a2c
Use correct URLs for test data samplesheet and GTDB, deactivate CAT d…
jfy133 Sep 29, 2025
634164e
Try CheckM2 instead of BUSCO
jfy133 Sep 30, 2025
550e64a
Update CHANGELOG.md
jfy133 Sep 30, 2025
a252c28
Fix linting (mismtach of schema with config, typo in module name in m…
jfy133 Sep 30, 2025
bc61826
docs(coverage): add section to usage documenting how to tune the perc…
prototaxites Sep 30, 2025
6bebd69
fix(test_full): set longread_percentidentity to 85
prototaxites Sep 30, 2025
b8cfe5f
docs: update changelog
prototaxites Sep 30, 2025
c3524ae
lint: trailing whitespace
prototaxites Sep 30, 2025
80e98d7
feat: post-release version bump
prototaxites Sep 30, 2025
c88532b
fix: bump pipeline version in all snapshots
prototaxites Sep 30, 2025
b324244
Apply suggestions from code review
prototaxites Sep 30, 2025
a09c114
added TODOs so i don't forget any
harper357 Oct 6, 2025
f49249a
added "enable_{tool)" for binqc
harper357 Oct 6, 2025
fb4caea
added enable_{tool} parameters with enable_busco as true as default
harper357 Oct 6, 2025
71d8e74
updated modules and their imports
harper357 Oct 6, 2025
554b595
fixed relative import
harper357 Oct 6, 2025
60b45a8
changed main variables
harper357 Oct 6, 2025
f72f075
support for checkm
harper357 Oct 6, 2025
6dd30ff
support for checkm2
harper357 Oct 6, 2025
c01cdba
hotfix for checkm support
harper357 Oct 6, 2025
3c9b6a3
busco support
harper357 Oct 6, 2025
9222c6d
remove TODOs
harper357 Oct 6, 2025
f9d95b9
updated TODO and changed to qc_summaries
harper357 Oct 7, 2025
e39661d
Merge branch 'dev' into multiple_bin_qc
jfy133 Oct 7, 2025
2bd856e
Rename bin QC tool parameters for consistency with other parameters
jfy133 Oct 14, 2025
1c84de1
Update binQC to export BUSCO, CheckM, and CheckM2 summaries separatel…
jfy133 Oct 14, 2025
3b8068f
update combine_tables.py to merge all files
jfy133 Oct 14, 2025
797278a
Update CHANGELOG
jfy133 Oct 14, 2025
f14f167
Add suffixes, improve changelog and update output.md to better descri…
jfy133 Oct 14, 2025
a60ab1c
Fix when not all bin qc tools are executed
jfy133 Oct 14, 2025
beac2e4
Update GTDBTK local subworkflow to filter on any/all bin QC metrics a…
jfy133 Oct 15, 2025
fbecbb7
Use correct variable name for renaming GTDBTK results
jfy133 Oct 15, 2025
833da83
Merge branch 'dev' into multiple_bin_qc
jfy133 Oct 15, 2025
8be3e60
Update snapshots for working tests
jfy133 Oct 15, 2025
5631752
Merge branch 'multiple_bin_qc' of github.com:harper357/mag into multi…
jfy133 Oct 15, 2025
0f6b11d
Debugging backup
jfy133 Oct 17, 2025
bbdc652
Simplify code
jfy133 Oct 17, 2025
02748f3
Merge branch 'dev' into multiple_bin_qc
jfy133 Oct 17, 2025
838015d
Apply suggestions from code review
jfy133 Oct 17, 2025
0cb4010
Update code to correctly append extension to bin names in GTDB-Tk sub…
jfy133 Oct 17, 2025
64328a8
Merge branch 'multiple_bin_qc' of github.com:harper357/mag into multi…
jfy133 Oct 17, 2025
a165949
Activate busco clean to remove variable intermediate files
jfy133 Oct 17, 2025
3aeaecf
Ignore variable md5sums and names only
jfy133 Oct 17, 2025
5127543
Allow duplicats because multiple QC tools per bin
jfy133 Oct 21, 2025
ca18a82
Set skip_aniscreen to all test configs in case gtdbtk tested with the…
jfy133 Oct 22, 2025
9faebcc
Retain bins based on completeness if any of the bin qc tools report h…
jfy133 Oct 22, 2025
6fa6c98
Merge branch 'dev' into multiple_bin_qc
jfy133 Oct 29, 2025
957996c
Update subworkflows/local/gtdbtk/main.nf
jfy133 Oct 29, 2025
b22cfdc
[automated] Fix code linting
nf-core-bot Oct 29, 2025
5b262fb
Update single_end test basic snapshot to include GTDBTk
jfy133 Nov 3, 2025
82064fa
update test_alternatives basic snapshot too
jfy133 Nov 3, 2025
044788b
Simplify error message
jfy133 Nov 3, 2025
1880a9c
Auto format
jfy133 Nov 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Added`

- [#873](https://github.com/nf-core/mag/pull/873) - Document usage of `longread_percentidentity` and `shortread_percentidentity` and set the value of `longread_percentidentity` in the `test_full` profile to 85 (by @prototaxites)
- [#842](https://github.com/nf-core/mag/pull/842) - Add support for running multiple binQC tools in one run using dedicated `--run_busco`, `--run_checkm`, and `--run_checkm2` parameters (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)
- [#875](https://github.com/nf-core/mag/pull/875) - Add binner COMEBin (by @d4straub)

### `Changed`

- [#878](https://github.com/nf-core/mag/pull/878) - Refine test_full config with optimised resource usage for AWS release megatests (by @jfy133)
- [#880](https://github.com/nf-core/mag/pull/880) - Updated to nf-core 3.4.1 `TEMPLATE` (by @jfy133)
- [#842](https://github.com/nf-core/mag/pull/842) - Change `bin_summary.tsv` format for improved clarity and more comprehensiveness (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)
- Now will include columns from all bin QC tools executed in a given run (i.e., all/any of BUSCO, CheckM and CheckM2)
- Adds suffixes to all columns (`_<toolname>`) to distinguish which column comes from which tool

### `Fixed`

Expand All @@ -48,6 +52,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Deprecated`

- [#842](https://github.com/nf-core/mag/pull/842) - Remove `--binqc_tool` (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)

## 5.0.0 - [2025-09-30]

### `Added`
Expand Down
134 changes: 101 additions & 33 deletions bin/combine_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,22 @@ def parse_args(args=None):
metavar="FILE",
help="Bin depths summary file.",
)
parser.add_argument("-b", "--binqc_summary", metavar="FILE", help="BUSCO summary file.")
parser.add_argument("-q", "--quast_summary", metavar="FILE", help="QUAST BINS summary file.")
parser.add_argument("-g", "--gtdbtk_summary", metavar="FILE", help="GTDB-Tk summary file.")
parser.add_argument("-a", "--cat_summary", metavar="FILE", help="CAT table file.")
parser.add_argument(
"-t", "--binqc_tool", help="Bin QC tool used", choices=["busco", "checkm", "checkm2"]
"-q", "--quast_summary", metavar="FILE", help="QUAST BINS summary file."
)
parser.add_argument(
"-g", "--gtdbtk_summary", metavar="FILE", help="GTDB-Tk summary file."
)
parser.add_argument(
"-u", "--busco_summary", metavar="FILE", help="BUSCO summary file."
)
parser.add_argument(
"-c", "--checkm_summary", metavar="FILE", help="CheckM summary file."
)
parser.add_argument(
"-e", "--checkm2_summary", metavar="FILE", help="CheckM2 summary file."
)
parser.add_argument("-a", "--cat_summary", metavar="FILE", help="CAT table file.")

parser.add_argument(
"-o",
Expand Down Expand Up @@ -74,54 +83,81 @@ def parse_cat_table(cat_table):
header=None,
skiprows=1,
)
# merge all rank columns into a single column
## merge all rank columns into a single column
df["CAT_rank"] = (
df.filter(regex="rank_\d+").apply(lambda x: ";".join(x.dropna()), axis=1).str.lstrip()
df.filter(regex="rank_\d+")
.apply(lambda x: ";".join(x.dropna()), axis=1)
.str.lstrip()
)
# remove rank_* columns
## remove rank_* columns
df.drop(df.filter(regex="rank_\d+").columns, axis=1, inplace=True)
df = df.add_suffix("_catpack")

return df


def main(args=None):
args = parse_args(args)

## INPUT VALIDATION

if (
not args.binqc_summary
not args.busco_summary
and not args.checkm_summary
and not args.checkm2_summary
and not args.quast_summary
and not args.gtdbtk_summary
):
sys.exit(
"No summary specified! "
"Please specify at least BUSCO, CheckM, CheckM2 or QUAST summary."
"Please specify at least one of BUSCO, CheckM, CheckM2 or QUAST summary."
)

# GTDB-Tk can only be run in combination with BUSCO, CheckM or CheckM2
if args.gtdbtk_summary and not args.binqc_summary:
## GTDB-Tk can only be run in combination with BUSCO, CheckM or CheckM2
if (
args.gtdbtk_summary
and not args.busco_summary
and not args.checkm_summary
and not args.checkm2_summary
):
sys.exit(
"Invalid parameter combination: "
"GTDB-TK summary specified, but no BUSCO, CheckM or CheckM2 summary!"
"GTDB-TK summary specified, but no BUSCO, CheckM or CheckM2 summary provided!"
)

# handle bin depths
## BIN DEPTH PROCESSING

## handle bin depths, and extract root bin names
results = pd.read_csv(args.depths_summary, sep="\t")
results.columns = ["Depth " + str(col) if col != "bin" else col for col in results.columns]
results.columns = [
"Depth " + str(col) if col != "bin" else col for col in results.columns
]
bins = results["bin"].sort_values().reset_index(drop=True)

if args.binqc_summary and args.binqc_tool == "busco":
busco_results = pd.read_csv(args.binqc_summary, sep="\t")
## BUSCO PROCESSING

if args.busco_summary:
busco_results = pd.read_csv(args.busco_summary, sep="\t")
busco_bins = set(busco_results["Input_file"])

if set(bins) != busco_bins and len(busco_bins.intersection(set(bins))) > 0:
warnings.warn("Bins in BUSCO summary do not match bins in bin depths summary")
warnings.warn(
"Bins in BUSCO summary do not match bins in bin depths summary"
)
elif len(busco_bins.intersection(set(bins))) == 0:
sys.exit("Bins in BUSCO summary do not match bins in bin depths summary!")
busco_results = busco_results.add_suffix("_busco")
results = pd.merge(
results, busco_results, left_on="bin", right_on="Input_file", how="outer"
results,
busco_results,
left_on="bin",
right_on="Input_file_busco",
how="outer",
) # assuming depths for all bins are given

if args.binqc_summary and args.binqc_tool == "checkm":
## CHECKM PROCESSING

if args.checkm_summary:
use_columns = [
"Bin Id",
"Marker lineage",
Expand All @@ -141,16 +177,23 @@ def main(args=None):
"4",
"5+",
]
checkm_results = pd.read_csv(args.binqc_summary, usecols=use_columns, sep="\t")
checkm_results = pd.read_csv(args.checkm_summary, usecols=use_columns, sep="\t")
checkm_results["Bin Id"] = checkm_results["Bin Id"] + ".fa"
if not set(checkm_results["Bin Id"]).issubset(set(bins)):
sys.exit("Bins in CheckM summary do not match bins in bin depths summary!")
checkm_results = checkm_results.add_suffix("_checkm")
results = pd.merge(
results, checkm_results, left_on="bin", right_on="Bin Id", how="outer"
results,
checkm_results,
left_on="bin",
right_on="Bin Id_checkm",
how="outer",
) # assuming depths for all bins are given
results["Bin Id"] = results["Bin Id"].str.removesuffix(".fa")
results["Bin Id_checkm"] = results["Bin Id_checkm"].str.removesuffix(".fa")

## CHECKM2 PROCESSING

if args.binqc_summary and args.binqc_tool == "checkm2":
if args.checkm2_summary:
use_columns = [
"Name",
"Completeness",
Expand All @@ -160,40 +203,65 @@ def main(args=None):
"Translation_Table_Used",
"Total_Coding_Sequences",
]
checkm2_results = pd.read_csv(args.binqc_summary, usecols=use_columns, sep="\t")
checkm2_results = pd.read_csv(
args.checkm2_summary, usecols=use_columns, sep="\t"
)
checkm2_results["Name"] = checkm2_results["Name"] + ".fa"
if not set(checkm2_results["Name"]).issubset(set(bins)):
sys.exit("Bins in CheckM2 summary do not match bins in bin depths summary!")
checkm2_results = checkm2_results.add_suffix("_checkm2")
results = pd.merge(
results, checkm2_results, left_on="bin", right_on="Name", how="outer"
results,
checkm2_results,
left_on="bin",
right_on="Name_checkm2",
how="outer",
) # assuming depths for all bins are given
results["Name"] = results["Name"].str.removesuffix(".fa")
results["Name"] = results["Name_checkm2"].str.removesuffix(".fa")

## QUAST PROCESSING

if args.quast_summary:
quast_results = pd.read_csv(args.quast_summary, sep="\t")
if not bins.equals(quast_results["Assembly"].sort_values().reset_index(drop=True)):
if not bins.equals(
quast_results["Assembly"].sort_values().reset_index(drop=True)
):
sys.exit("Bins in QUAST summary do not match bins in bin depths summary!")
quast_results = quast_results.add_suffix("_quast")
results = pd.merge(
results, quast_results, left_on="bin", right_on="Assembly", how="outer"
results,
quast_results,
left_on="bin",
right_on="Assembly_quast",
how="outer",
) # assuming depths for all bins are given

## GTDBTK PROCESSING

if args.gtdbtk_summary:
gtdbtk_results = pd.read_csv(args.gtdbtk_summary, sep="\t")
if len(set(gtdbtk_results["user_genome"].to_list()).difference(set(bins))) > 0:
sys.exit("Bins in GTDB-Tk summary do not match bins in bin depths summary!")
gtdbtk_results = gtdbtk_results.add_suffix("_gtdbtk")
results = pd.merge(
results, gtdbtk_results, left_on="bin", right_on="user_genome", how="outer"
results,
gtdbtk_results,
left_on="bin",
right_on="user_genome_gtdbtk",
how="outer",
) # assuming depths for all bins are given

## CAT_PACK PROCESSING

if args.cat_summary:
cat_results = parse_cat_table(args.cat_summary)
if len(set(cat_results["bin"].to_list()).difference(set(bins))) > 0:
if len(set(cat_results["bin_catpack"].to_list()).difference(set(bins))) > 0:
sys.exit("Bins in CAT summary do not match bins in bin depths summary!")
results = pd.merge(
results,
cat_results[["bin", "CAT_rank"]],
cat_results[["bin_catpack", "CAT_rank_catpack"]],
left_on="bin",
right_on="bin",
right_on="bin_catpack",
how="outer",
)

Expand Down
6 changes: 4 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -512,8 +512,8 @@ process {
]
}

withName: CONCAT_BINQC_TSV {
ext.prefix = { "${params.binqc_tool}_summary" }
withName: 'CONCAT_BUSCO_TSV|CONCAT_CHECKM_TSV|CONCAT_CHECKM2_TSV' {
ext.prefix = { "${meta.id}_summary" }
publishDir = [
path: { "${params.outdir}/GenomeBinning/QC" },
mode: params.publish_dir_mode,
Expand All @@ -532,6 +532,7 @@ process {
}

withName: CHECKM2_PREDICT {
tag = { "${meta.assembler}-${meta.binner}-${meta.domain}-${meta.refinement}-${meta.id}" }
ext.prefix = { "${meta.assembler}-${meta.binner}-${meta.domain}-${meta.refinement}-${meta.id}" }
publishDir = [
path: { "${params.outdir}/GenomeBinning/QC/CheckM2" },
Expand Down Expand Up @@ -639,6 +640,7 @@ process {
}

withName: GTDBTK_CLASSIFYWF {
tag = { "${meta.assembler}-${meta.binner}-${meta.domain}-${meta.refinement}-${meta.id}" }
ext.args = [
"--extension fa",
"--min_perc_aa ${params.gtdbtk_min_perc_aa}",
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ params {
busco_clean = true
// Prokka is the slowest step of the tests, so we speed up by turning off CDS/product searching
prokka_fast_mode = true
// Source: https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/auxillary_files/gtdbtk_package/mockup_db/
gtdb_db = params.pipelines_testdata_base_path + 'mag/databases/gtdbtk/gtdbtk_mockup_20250422.tar.gz'
cat_db = params.pipelines_testdata_base_path + 'mag/databases/cat/minigut_cat.tar.gz'
cat_no_suggestive_asterisks = true
Expand Down
5 changes: 4 additions & 1 deletion conf/test_alternatives.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,16 @@ params {
// Input data
input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.v4.csv'
clip_tool = 'trimmomatic'
binqc_tool = 'checkm2'
run_busco = true
busco_clean = true
run_checkm2 = true
bin_domain_classification = true
skip_spades = true
skip_quast = true
skip_prodigal = true
skip_prokka = true
skip_gtdbtk = true
gtdbtk_skip_aniscreen = true
skip_maxbin2 = true
skip_concoct = true
skip_comebin = true
Expand Down
1 change: 1 addition & 0 deletions conf/test_assembly_input.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ params {
skip_prodigal = true
skip_prokka = true
skip_gtdbtk = true
gtdbtk_skip_aniscreen = true
skip_concoct = false
skip_comebin = true

Expand Down
3 changes: 2 additions & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ params {
// Skip CONCOCT due to timeout issues
skip_concoct = true

binqc_tool = "checkm2"
run_checkm2 = true
run_busco = false

// Set Prokka compliance mode to allow metaSPAdes bins to be annotated
prokka_with_compliance = true
Expand Down
1 change: 1 addition & 0 deletions conf/test_hybrid.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ params {
skip_flye = true
skip_metamdbg = true
skip_gtdbtk = true
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true

Expand Down
1 change: 1 addition & 0 deletions conf/test_longreadonly.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ params {
busco_db_lineage = 'bacteria_odb10'

skip_gtdbtk = true
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true
}
1 change: 1 addition & 0 deletions conf/test_longreadonly_alternatives.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ params {
skip_prodigal = true
skip_prokka = true
skip_gtdbtk = true
gtdbtk_skip_aniscreen = true
skip_concoct = true
skip_comebin = true
skip_metaeuk = true
Expand Down
10 changes: 7 additions & 3 deletions conf/test_minimal.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ process {
}

params {
config_profile_name = 'Test nothing profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
config_profile_name = 'Test nothing profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data
input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.v4.csv'
Expand All @@ -43,8 +43,12 @@ params {
skip_comebin = true
skip_prokka = true
skip_binqc = true
run_busco = false
run_checkm = false
run_checkm2 = false
skip_gtdbtk = true
skip_ancient_damagecorrection = true
gtdbtk_min_completeness = 0.01
gtdbtk_skip_aniscreen = true
skip_metaeuk = true
skip_ancient_damagecorrection = true
}
8 changes: 6 additions & 2 deletions conf/test_single_end.config
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,16 @@ params {
skip_comebin = true
min_length_unbinned_contigs = 1000000
max_unbinned_contigs = 2
binqc_tool = 'checkm'
run_busco = false
run_checkm = true
run_virus_identification = true
genomad_splits = 7
// micro_db not compatible with current genNomad version
genomad_db = null // 'https://zenodo.org/records/11945948/files/genomad_microdb.tar.gz'
genomad_db = null
// 'https://zenodo.org/records/11945948/files/genomad_microdb.tar.gz'
gtdb_db = params.pipelines_testdata_base_path + 'mag/databases/gtdbtk/gtdbtk_mockup_20250422.tar.gz'
gtdbtk_min_completeness = 3
gtdbtk_skip_aniscreen = true
cat_db = params.pipelines_testdata_base_path + 'mag/databases/cat/minigut_cat.tar.gz'
cat_no_suggestive_asterisks = true
}
Loading