You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the pipeline is started to run from stringtie to rMATS process (second half of the pipeline), it cannot be resumed from the same point. Whenever you resume, it starts from scratch from stringtie step.
Example of resumed job: https://cloudos.lifebit.ai/app/jobs/623d660114d5d201dc0bb012
(Main CloudOS, workspace jax-anczukow-lab, id 5ec2c818d663c5c2cd3bd991)
The job that it was (probably) resumed from: https://cloudos.lifebit.ai/app/jobs/6232367a14d5d201dc04e5e1
(There was many same jobs with same params submitted over same period)
You can see that there is nothing resumed, all 1072 initial jobs are run again. Bottom line - with exact same parameters, input files, resource and disc specifications perfectly resumed pipeline starts over.
If we check what are the inputs of rMATS process, we can see that it takes in bam files, split and prepared in some very complicated and twisted channels. I think the reason here is the same as in second link above - order of items in channel is not preserved or smth. Given how complicated the bams channel is prepared, I think the cause problem is somewhere there.
Check lines L344-L354 and L934-L960 in main.nf to see how the bams channel for rmats process is created.
Implementation
Possibly simplifying the channel creation or otherwise making it to produce more consistent in order and structure result would solve the reproducibility issue.
The text was updated successfully, but these errors were encountered:
Vlad-Dembrovskyi
added
the
P1
Med priority: Important for pipeline completeness and good ux or refactors a code smell
label
Apr 22, 2022
Problem
When the pipeline is started to run from stringtie to rMATS process (second half of the pipeline), it cannot be resumed from the same point. Whenever you resume, it starts from scratch from stringtie step.
Example of resumed job:
https://cloudos.lifebit.ai/app/jobs/623d660114d5d201dc0bb012
(Main CloudOS, workspace jax-anczukow-lab, id 5ec2c818d663c5c2cd3bd991)
The job that it was (probably) resumed from: https://cloudos.lifebit.ai/app/jobs/6232367a14d5d201dc04e5e1
(There was many same jobs with same params submitted over same period)
You can see that there is nothing resumed, all 1072 initial jobs are run again. Bottom line - with exact same parameters, input files, resource and disc specifications perfectly resumed pipeline starts over.
Working real-life example pipeline with minimal data for future debugging:
https://cloudos.lifebit.ai/app/jobs/6247263714d5d201dc106612
(same workspace)
Solution
The possible causes of pipeline being not resumable:
If we check what are the inputs of rMATS process, we can see that it takes in bam files, split and prepared in some very complicated and twisted channels. I think the reason here is the same as in second link above - order of items in channel is not preserved or smth. Given how complicated the
bams
channel is prepared, I think the cause problem is somewhere there.Check lines L344-L354 and L934-L960 in main.nf to see how the
bams
channel forrmats
process is created.Implementation
Possibly simplifying the channel creation or otherwise making it to produce more consistent in order and structure result would solve the reproducibility issue.
The text was updated successfully, but these errors were encountered: