Skip to content

Conversation

@jfy133
Copy link
Member

@jfy133 jfy133 commented Oct 25, 2025

Closes #890

  • Essentially the input tuple for pooling was in the wrong format meaning only R1s were being pooled and out of order (i.e, what was meant to be a samples R2, was the second samples R1)
  • This was missed as the --coassembly_group parameter was missed out in the new config structures

TODO:

  • Run tests for all other configs to make sure nothing else changed
  • Regenerate snapshot for test_alternative now coassembly activated

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 marked this pull request as draft October 25, 2025 06:05
@github-actions
Copy link

github-actions bot commented Oct 25, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 66b1301

+| ✅ 375 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.4.1
  • Run at 2025-10-28 20:11:54

@jfy133
Copy link
Member Author

jfy133 commented Oct 25, 2025

@nf-core-bot fix linting

// We have to merge reads together to match tuple structure of POOL_SHORT_READS/
// This MUST be in a interleaved structure (s1_r1, s1_r2, s2_r1, s2_r2, ...)
// So we merge the two list of R1 and R2s, and sort them to ensure correct order above
ch_short_reads_grouped_for_pooling = ch_short_reads_grouped.map { meta, reads1, reads2 -> [meta, [reads1 + reads2].flatten().sort()] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that the reads files here are standardly-named such that a sort() won't break the order?

Copy link
Collaborator

@d4straub d4straub Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can assume that because all those files are renamed for ${prefix} at that point, imho. Or is it possible to skip the complete QC so that original files names come through? Not entirely sure...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's possible to basically completely skip QC...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely do you think it would be that people don't have a _R1 / _R2, _1 / _2, _F, _R in their FASTQ files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be wary of assuming anything about file names unless we have strictly controlled it. One way to do that would be also to force a schema like the above in the samplesheet validation, so we stop early before errors.

Otherwise we have to be careful with channel order, etc.?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical Illumina output from the sequencing facilities & companies I know is <sample>_R1_<lane>.fastq.gz. Single-end read files might not have any of those pattern to identify direction (R1/1F/whatever).
I think that makes it already more complicated to catch? I am not an regex expert though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d4straub 's pattern the one of the most common patterns I've seen too... and other people append the adapter index sequence to the end after the lane ID too... so I really don't think this will be simply be solvable.

But for me that isnt needed, potentially we could add a comment (Warning) in the docs about the sorting issue with SPAdes & skipping all QC & file names, maybe to the co-assembly step (https://nf-co.re/mag/5.1.0/docs/usage/#the-group-column?),

I'm erring for this, but I want this to be a democracy.

@dialvarezs any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted my previous comment, as I wasn't sure it actually worked, but I think it does:

ch_a = Channel.of(["meta", ["a", "c", "b"], ["d", "b", "f"]])
ch_a.map { meta, f1, f2 ->
    def transposed_pairs = [f1, f2].transpose()
    println transposed_pairs
    
    def sorted_pairs = transposed_pairs.sort { it[0] }  
    println sorted_pairs
      
    def interleaved = sorted_pairs.flatten()

    return [meta, interleaved] 
}.view()
transposed: [[a, d], [c, b], [b, f]]
sorted: [[a, d], [b, f], [c, b]]

output: [meta, [a, d, b, f, c, b]]

So we can just sort on fasta1's name, avoiding issues with naming entirely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that code above and it seems fine to me. It also works with e.g.
ch_a = Channel.of(["meta", ["a_s1_R1_a", "c_s3_R1_c", "b_s2_R1_b"], ["d_s1_R2_d", "b_s3_R2_b", "f_s2_R2_f"]])
that is sorted to
[meta, [a_s1_R1_a, d_s1_R2_d, b_s2_R1_b, f_s2_R2_f, c_s3_R1_c, b_s3_R2_b]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK nice thanks for the cross-validation @d4straub ! When I am more functional I will try to implement it!

…-unequal-number-of-reads' of github.com:nf-core/mag into 890-metaspades-exit-status-21-paired-read-files-contain-unequal-number-of-reads
@jfy133 jfy133 marked this pull request as ready for review October 29, 2025 06:20
// We have to merge reads together to match tuple structure of POOL_SHORT_READS/
// This MUST be in a interleaved structure (s1_r1, s1_r2, s2_r1, s2_r2, ...)
// So we merge the two list of R1 and R2s, and sort them to ensure correct order above
ch_short_reads_grouped_for_pooling = ch_short_reads_grouped.map { meta, reads1, reads2 -> [meta, [reads1 + reads2].flatten().sort()] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_short_reads_grouped_for_pooling = ch_short_reads_grouped.map { meta, reads1, reads2 -> [meta, [reads1 + reads2].flatten().sort()] }
ch_short_reads_grouped_for_pooling = ch_short_reads_grouped.map { meta, reads1, reads2 -> [meta, [reads1, reads2].transpose().sort { it[0].getName() }.flatten()] }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants