Skip to content

Conversation

@d4straub
Copy link
Collaborator

@d4straub d4straub commented Oct 16, 2025

Add MetaBinner, related to #874
Needs #907 to function properly.

This integrates into the pipeline an altered version of the module METABINNER and the script create_metabinner_bins.py from https://github.com/hzi-bifo/mag.

Issues:

  • MetaBinner fails with -profile test_full to produce any bins (but that is error is ignored)
  • running on bacass test_full data works as detailed below
  • doesnt produce any bins with any CI test datasets (all test profiles with --skip_comebin false --skip_metabinner false failed with those binners) therefore deactivated in all CI tests

Successful test was with bacass test_full files:

wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR100/028/SRR10093028/SRR10093028_1.fastq.gz
wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR100/029/SRR10093029/SRR10093029_1.fastq.gz
wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR100/028/SRR10093028/SRR10093028_2.fastq.gz
wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR100/029/SRR10093029/SRR10093029_2.fastq.gz
nextflow run d4straub/mag -r add-MetaBinner -profile cfc --input samplesheet_bacass_test.csv --outdir results_test_bacass -resume --skip_spades --skip_quast --skip_busco --skip_prodigal --skip_prokka --skip_gtdbtk --refine_bins_dastool --keep_phix --coassemble_group --skip_comebin

where the samplesheet contained:

sample,group,short_reads_1,short_reads_2,long_reads,short_reads_platform,long_reads_platform
SRR10093028,a,./SRR10093028_1.fastq.gz,./SRR10093028_2.fastq.gz,,ILLUMINA,
SRR10093029,a,./SRR10093029_1.fastq.gz,./SRR10093029_2.fastq.gz,,ILLUMINA,

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@d4straub
Copy link
Collaborator Author

@nf-core-bot fix linting

@d4straub d4straub marked this pull request as ready for review October 23, 2025 13:27
@d4straub
Copy link
Collaborator Author

Now ready for review, will resolve that conflict with ro-crate -metadata.json soon, but that wont impact the functional code at all.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @prototaxites that the module is too complex at the moment.

Most of the steps should be in a CONCOCT-like workflow, and the the two python scripts and metabinnner itself should definitely be modules.

The other reason I would prefer to split this up is because it will make it easier to debug.

I'm not too happy about having to have custom commands and scripts (not your fault, just design of the tool) and I want to decouple these potentially more brittle parts from the tool itself.

@d4straub
Copy link
Collaborator Author

d4straub commented Oct 27, 2025

Alright, split the module into 4 modules and added a subworkflow.
Tested as in opening post & works.
Due to the ongoing release prep for 5.1.0 I am waiting until after the release for merging usptream dev.

@d4straub
Copy link
Collaborator Author

Ready for another round of reviews, maybe @jfy133 & @prototaxites again?

Copy link
Contributor

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions from my side.
With the subworkflow implementation is definitively easier to follow.

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 3, 2025

@nf-core-bot fix linting

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 3, 2025

I'll run the pipeline as usual before merging this PR to confirm I didnt break anything, but our cluster is a little over-used by me right now so it takes some time.
Edit: Success!

Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good @d4straub, a few small comments from me

]
]
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
ext.min_contig_size = { params.min_contig_size < 1000 ? "1000" : "${params.min_contig_size}" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be ext directives, or (if this is mandatory) should it just be an input to the module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those arent mandatory, since they have defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be mandatory in the sense that if there was no default, the tool wouldn't run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tool needs that min_contig_size. The process has default 1000 in case there is no ext.min_contig_size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case I would suggest for clarity that it should be an optional val input to the module, and you set a default in the same way. That means that all of the mandatory inputs for the module are set in the same place.

Following the logic from Mahesh here: nf-core/proposals#69

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

zcat $kmer > kmer_profile.csv
# create coverage profile in Metabinner format
zcat ${depth} | awk '{if (\$2>${min_contig_size}) print \$0 }' | cut -f -1,4- > coverage_profile.tsv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, but if we consider in future to put this module in the modules repo, we should move this elsewhere to make the module more reusable

Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming tests pass!) - but we definitely need to work on a suitable test data set!

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 4, 2025

Yes, suitable datasets would be great. Its not only for testing itself, but also implementing tools that do not run on current test data is troublesome.
Edit: I make sure I get it to run reproducible as detailed in the opening post. But its certainly not optimal.

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 4, 2025

Test Run nf-test / docker | 25.04.2 | 2/8 (pull_request) = '-profile test_alternatives' fails with

! 85         "busco_summary.tsv:md5,f21 ! 85         "busco_summary.tsv:md5,7a3
08b2ad319f7bdbe1d42c185e34e43",         afa3387a815b331c35a3420d8c7f4",    

I'll retry, potentially its a sorting issue.

Edit: no, that seems to be consistent, investigating...

@jfy133
Copy link
Member

jfy133 commented Nov 4, 2025

@dialvarezs you also saw this in the verison bump PR right?

@dialvarezs
Copy link
Contributor

@jfy133 Yes, the same. #912 should fix it.

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 4, 2025

Ah great than I dont need to investigate :)
So I wait until #912 is merged?

@dialvarezs
Copy link
Contributor

It's ready now @d4straub!

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 4, 2025

Thanks everybody!!!!

@d4straub d4straub merged commit 5c4f4b8 into nf-core:dev Nov 4, 2025
21 checks passed
@dialvarezs dialvarezs mentioned this pull request Nov 5, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants