Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job submission with --prioritize runs all at once, then stalls #134

Open
KB1RD opened this issue Aug 13, 2024 · 4 comments
Open

Job submission with --prioritize runs all at once, then stalls #134

KB1RD opened this issue Aug 13, 2024 · 4 comments

Comments

@KB1RD
Copy link

KB1RD commented Aug 13, 2024

Software Versions

$ snakemake --version
8.16.0
$ conda list | grep snakemake-executor-plugin-slurm
snakemake-executor-plugin-slurm 0.8.0              pyhdfd78af_0    bioconda
snakemake-executor-plugin-slurm-jobstep 0.2.1              pyhdfd78af_0    bioconda
$ sinfo --version
slurm 23.11.4

Describe the bug
When using the --prioritize option of Snakemake, the following happens:
1.) All prioritized jobs are submitted all at once, overwhelming the cluster if there are hundreds or thousands
2.) Snakemake then blocks waiting for every one of these jobs to complete before executing more jobs. I'm assuming Snakemake will execute more jobs once these prioritized jobs are finished, but I'm still waiting on my university cluster to churn through the massive batch that I accidentally submitted the other day because of the bug.

Minimal example
Use the -P flag as described here to prioritize a particular target

@cmeesters
Copy link
Member

I am pretty sure, this is not a bug of this executor. @johanneskoester ? The executor only receives one job at a time (if no group job is given).

@KB1RD
Copy link
Author

KB1RD commented Aug 14, 2024

I can move the issue to Snakemake main if that's the cause.

Now that the prioritized jobs were completed, Snakemake began (according to the log) resubmitting jobs that were already run two at a time, instead of 64 at a time as I requested, but somehow they weren't submitted to the cluster despite SN log indications that they were. The jobs were dependencies of the file I prioritized: So all several hundred ran all at once on the cluster, then Snakemake tried running them two at a time, again, but also failed to do so for some unknown reason. Don't know if that helps narrow down the cause.

@cmeesters
Copy link
Member

Cam you please indicate your command line (or profile), including priotirization and the 64 semaphore?

@KB1RD
Copy link
Author

KB1RD commented Aug 15, 2024

Running our lab's variant of ACCDB... (The only changes are to add a dataset and use a custom version of the software that Snakemake runs, Psi4)

snakemake -j 64 --executor slurm --default-resources slurm_account=<act-name> --rerun-incomplete -P Outputs/<dataset-name>/IndValues.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants