Issues running pypi-bundled workflows on the cluster #162

pvandyken · 2022-05-12T15:19:21Z

pvandyken
May 12, 2022
Maintainer

Ran into this issue yesterday, and if I infer correctly, others have been experiencing it too.

I was attempting to run snakedwi yesterday using the following protocol:

Install snakedwi into a kslurm env
Load up the env on a compute node
Run the app with the now locally installed snakedwi for maximum efficiency

So far, so good. The trouble comes when I want to enable snakemake cluster integration to run the job on multiple clusters. The issue is that the entire snakemake app is installed into local scratch, which cannot be shared across jobs.

The quick, easy solution is to always install snakemake apps into a shared fs (e.g. shared scratch). This side steps the problem entirely. Unfortunately, you lose the efficiencies gained by the localscratch installation.

The best way I can come up with currently to maintain the local install looks like the following sketch:

Create a custom slurm-submit.py (based of e.g. cc-slurm or similar)
First, prepend the snakemake call with the relevant kpy code to load up the correct virtual env. To generate this code, slurm-submit.py must:
a. Check if the job is being run in a virtual env (e.g. via $VIRTUAL_ENV)
b. Check if the virtual env is within localscratch (e.g. is $VIRTUAL_ENV relative to $SLURM_TMPDIR)
c. Check if the python executable being used is a part of that $VIRTUAL_ENV)
d. Assume that, if all of the above are true, that kpy is being used for cluster virtual env management
e. Read the relevant virtual env files to deduce the name of the virtual env (this information would be dropped in the venv config file by kpy)
f. Generate the venv loading code.
Parse the submit script to replace the python executable, and possibly the snakefile path, with our own
a. The python path is fairly straightforward, just regex for the /path/to/python -m portion of the file and replace with $VIRTUAL_ENV/bin/python.
b. The snakefile path is a bit harder. Search for --snakefile /path/to/Snakefile, then, if it's relative to our current $VIRTUAL_ENV, we make it relative to the new $VIRTUAL_ENV. Otherwise, if it's relative to $SLURM_TMPDIR, we error out, because we won't have access to it. Otherwise, we assume it's on shared fs.

Obviously, step 2 could be made easier by directly specifying what kpy env to use. The best way I can think of would be to use a --default-resource provision in the call, which could then be picked up by slurm-submit.py.

Overall, the solution is fairly hacky, but I'm not sure it could be any more streamlined. Snakemake could potentially make the submit script template more granular, so that python path or snakefile path could be provided separately, but that would only eliminate a few substeps. Importantly, I can't see any world where this can be done without a purpose-built slurm-submit.py.

So I guess the question is whether it's worth implementing the above plan, or just deciding that we don't support localscratch-installed apps.

pvandyken · 2022-05-12T15:20:34Z

pvandyken
May 12, 2022
Maintainer Author

I should note that while I referred to kslurm extensively, the problem is not caused by kslurm, and would occur anytime someone tried to run a local installation. In fact, without kslurm, the above steps would be even more complicated

3 replies

akhanf May 12, 2022
Maintainer

Thanks for thinking in-depth about this one -- I do think localscratch-installed apps (via kpy/kslurm) is something we should strive to support, the current solution of using persistant /scratch to create/keep the venv is not an ideal one..

Although it may be system-specific, that is what the cluster profile is actually meant for, so I don't really have a problem with customizing the jobscript slurm-submit.py for kpy integration.

Re: your last comment - not sure what you mean - on a local system or any system where the installed package is accessible from all nodes this isn't a problem, right?

akhanf May 12, 2022
Maintainer

note this was meant to be a reply to the first post not this one

pvandyken May 12, 2022
Maintainer Author

Not sure which last comment your referring to... if the not supporting localscratch-installed apps, I mean specifically installing and running a pypi indexed snakemake app on a localscratch directory, and then running that app across multiple clusters

akhanf · 2022-05-12T15:51:51Z

akhanf
May 12, 2022
Maintainer

re: the proposed solution -- I'm not sure why we can't just make most of the modifications to the jobscript shell script instead, ie add the logic there to load and activate the venv, then don't have to much around with changing the snakemake calls -- or maybe I'm missing something..

6 replies

akhanf May 12, 2022
Maintainer

Ah I see

akhanf May 12, 2022
Maintainer

that does make it alot trickier and hackier --
An alternative solution could be to have the ability in kslurm to export+activate an already created kpy env to /scratch (or perhaps any target path). Would still have all the benefits of creating and maintaining the venv with kpy, but when you are ready to run a batch job you export it and go.. adds an extra manual step though..

pvandyken May 12, 2022
Maintainer Author

Not exactly sure what you mean here... Exporting a venv to the shared scratch folder? I'm not sure that provides much benefit, it's easy enough to install a persistent venv

akhanf May 12, 2022
Maintainer

Isn't extracting a pre-made venv tarfile to /scratch much faster though?

pvandyken May 12, 2022
Maintainer Author

Yes... but for me the reason the speedboost is nice is because /localscratch venvs self-destruct, and so you have to repeatedly install them. For a persistent venv in shared scratch, I'm not sure it provides much benefit.

The workflow (that already works) is to just create a venv using any ordinary means on the shared scratch folder, activate it, install the app (e.g. snakedwi) and then run it. It works across clusters without issue. Perhaps you're thinking of something else though?

akhanf · 2022-05-12T16:53:22Z

akhanf
May 12, 2022
Maintainer

I was just suggesting a hybrid of the two -- if say you're already using kslurm/kpy to manage your venvs, and now you want to run a job across nodes. You export, then you run. Right now the workflow would be more disruptive (ie make an entirely new venv on /scratch)

2 replies

akhanf May 12, 2022
Maintainer

oops I keep responding in the wrong window -- what happens when you're doing 10 things at once..

pvandyken May 12, 2022
Maintainer Author

Right, okay. Yeah, I don't think there's any intrinsic reason that wouldn't work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues running pypi-bundled workflows on the cluster #162

{{title}}

Replies: 3 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Issues running pypi-bundled workflows on the cluster #162

pvandyken May 12, 2022 Maintainer

Replies: 3 comments · 11 replies

pvandyken May 12, 2022 Maintainer Author

akhanf May 12, 2022 Maintainer

akhanf May 12, 2022 Maintainer

pvandyken May 12, 2022 Maintainer Author

akhanf May 12, 2022 Maintainer

akhanf May 12, 2022 Maintainer

akhanf May 12, 2022 Maintainer

pvandyken May 12, 2022 Maintainer Author

akhanf May 12, 2022 Maintainer

pvandyken May 12, 2022 Maintainer Author

akhanf May 12, 2022 Maintainer

akhanf May 12, 2022 Maintainer

pvandyken May 12, 2022 Maintainer Author

pvandyken
May 12, 2022
Maintainer

Replies: 3 comments 11 replies

pvandyken
May 12, 2022
Maintainer Author

akhanf May 12, 2022
Maintainer

akhanf May 12, 2022
Maintainer

pvandyken May 12, 2022
Maintainer Author

akhanf
May 12, 2022
Maintainer

akhanf May 12, 2022
Maintainer

akhanf May 12, 2022
Maintainer

pvandyken May 12, 2022
Maintainer Author

akhanf May 12, 2022
Maintainer

pvandyken May 12, 2022
Maintainer Author

akhanf
May 12, 2022
Maintainer

akhanf May 12, 2022
Maintainer

pvandyken May 12, 2022
Maintainer Author