Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pipeline reproducibility issue with rMATS #310

Open
Vlad-Dembrovskyi opened this issue Apr 22, 2022 · 0 comments
Open

Fix pipeline reproducibility issue with rMATS #310

Vlad-Dembrovskyi opened this issue Apr 22, 2022 · 0 comments
Labels
P1 Med priority: Important for pipeline completeness and good ux or refactors a code smell

Comments

@Vlad-Dembrovskyi
Copy link
Contributor

Problem

When the pipeline is started to run from stringtie to rMATS process (second half of the pipeline), it cannot be resumed from the same point. Whenever you resume, it starts from scratch from stringtie step.

Example of resumed job:
https://cloudos.lifebit.ai/app/jobs/623d660114d5d201dc0bb012
(Main CloudOS, workspace jax-anczukow-lab, id 5ec2c818d663c5c2cd3bd991)
The job that it was (probably) resumed from: https://cloudos.lifebit.ai/app/jobs/6232367a14d5d201dc04e5e1
(There was many same jobs with same params submitted over same period)
You can see that there is nothing resumed, all 1072 initial jobs are run again. Bottom line - with exact same parameters, input files, resource and disc specifications perfectly resumed pipeline starts over.

Working real-life example pipeline with minimal data for future debugging:
https://cloudos.lifebit.ai/app/jobs/6247263714d5d201dc106612
(same workspace)

Solution

The possible causes of pipeline being not resumable:

If we check what are the inputs of rMATS process, we can see that it takes in bam files, split and prepared in some very complicated and twisted channels. I think the reason here is the same as in second link above - order of items in channel is not preserved or smth. Given how complicated the bams channel is prepared, I think the cause problem is somewhere there.

Check lines L344-L354 and L934-L960 in main.nf to see how the bams channel for rmats process is created.

Implementation

Possibly simplifying the channel creation or otherwise making it to produce more consistent in order and structure result would solve the reproducibility issue.

@Vlad-Dembrovskyi Vlad-Dembrovskyi added the P1 Med priority: Important for pipeline completeness and good ux or refactors a code smell label Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Med priority: Important for pipeline completeness and good ux or refactors a code smell
Projects
None yet
Development

No branches or pull requests

1 participant