Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice: a short-running rule called many times #115

Open
bdelepine opened this issue Jul 19, 2024 · 0 comments
Open

Best practice: a short-running rule called many times #115

bdelepine opened this issue Jul 19, 2024 · 0 comments

Comments

@bdelepine
Copy link

Hi all,

This is a basic question but I would be glad to hear your thoughts on it: what is the best practice to design a short-running rule that will be used to spawn many jobs (using snakemake in a SLURM context, of course). I would define "short-running" as inferior to 3min, and "many jobs" as thousands of calls.

Without Snakemake, I would have used SLURM job arrays and a wrapper script to get batches of ~1h running jobs. My assumption is that it is best to give SLURM big-enough chunks so that we do not stress it too much with many jobs (and remain below the max number of jobs limit), but also small-enough chunks so that the scheduler is more likely to give us resources (and allocate them fairly among users).

With Snakemake and the slurm plugin, I would like to avoid writing a wrapper script, so:

  • I may write rules to split/gather the batches much like the built-in scatter-gather feature. This works, but it's kind of like writing a wrapping script.
  • I may use group and group-components like in Bundle many small jobs into one larger job submission snakemake#872. This also works, but I find it kind of cumbersome to parametrize resources (ex: if I want to design ~1h batches out of a rule that typically takes ~2min, I must first set cores to this rule cpus_per_task to make sure calls will be in series, then group-components to 30 (=60/2); but as cores is set for all groups, it gets more complex if I must design "batches" for several rules).

Are my assumptions correct? What do you usually do to deal with short-running rules called many times?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant