-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupTuple randomly returns sorted elements #5579
Comments
I think it is a limitation of groupTuple. If you have multiple grouped lists and sorting enabled, each list will be sorted independently of the others. So the sorted lists might not correspond to each other. I think the fix you're referring to is to disable sorting, in which case the lists will be consistent with each other. Previously this would cause issues with resume but I think this was addressed in 24.10 |
Hi Ben, thanks for the quick reply. So, I tested again on the new version, it is still there. How to disable sorting? Isn't it false by default ? If I set ".groupTuple(sort:false)" I am getting invalid value for sort:
|
Sort is disabled by default, so it should work if you don't specify the sort option |
Even weirder:
produces
..... what is going on? :D I am feeling like in the twilight zone. Hahah, and if I do
I am getting:
Still, the third element is a |
I created this minimal example from your snippets: inputs = Channel.of(
['rundir', 'PNET', file('gs://my-bucket/rundir/aggregated_dmps/PNET_by_case.tsv.gz')],
['rundir', 'LNET', file('gs://my-bucket/rundir/aggregated_dmps/LNET_by_case.tsv.gz')],
['rundir', 'SINET', file('gs://my-bucket/rundir/aggregated_dmps/SINET_by_case.tsv.gz')],
['rundir', 'NET', file('gs://my-bucket/rundir/aggregated_dmps/NET_by_case.tsv.gz')]
)
inputs
.groupTuple()
.view()
.transpose()
.view() But I am not seeing any issues with the grouped lists: $ NXF_VER=24.11.0-edge nextflow main.nf
[rundir, [PNET, LNET, SINET, NET], [/rundir/aggregated_dmps/PNET_by_case.tsv.gz, /rundir/aggregated_dmps/LNET_by_case.tsv.gz, /rundir/aggregated_dmps/SINET_by_case.tsv.gz, /rundir/aggregated_dmps/NET_by_case.tsv.gz]]
[rundir, PNET, /rundir/aggregated_dmps/PNET_by_case.tsv.gz]
[rundir, LNET, /rundir/aggregated_dmps/LNET_by_case.tsv.gz]
[rundir, SINET, /rundir/aggregated_dmps/SINET_by_case.tsv.gz]
[rundir, NET, /rundir/aggregated_dmps/NET_by_case.tsv.gz] |
Yes indeed, thanks for trying to reproduce it, I tried the same on my local, it only happens on that example, after it has been produced by a sub-workflow, I cannot reproduce it locally either, only on Google Cloud. The producing workflow goes like that:
And then I am retrieving that variable in the parent workflow like that:
|
Bug report
This is entirely for posterity, as the error occurred in 24.09.2-edge, and it has been fixed on latest version 24.11.0-edge . However, I believe this bug is also quite strange by nature, and maybe it had not been observed. It cannot be reproduced outside of Google Cloud, and it has to do only with data coming from an output of another workflow.
For a very specific example:
script within workflow:
Program output
The second list is sorted at the output.
Environment
The text was updated successfully, but these errors were encountered: