-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PySlurm ignoring some batch job options #169
Comments
I think the issue or bug is on L2679: Lines 2675 to 2683 in c50467c
I think this might be carry over from a previous version and no longer works in this 19.05. If you could help me track down what it should be, we should be able to fix it. |
I don't see any obvious problems with the code snippet you posted, and the
In the meantime, I'm getting around this by translating the dictionary of job options into a command line call and invoking
|
I am also having trouble submitting a job with anything other than 1 CPU core
This results in a job that has 32,000 MB and but only 1 cpu core.
|
AHHH! I figured it out from @cahartsell's comment above. it should be:
underscores works, dashes doesn't! |
Details
Issue
PySlurm seems to ignore some valid "sbatch" parameters when submitting a batch job. Example python code:
Equivalent "sbatch" command line call:
The sbatch command line call behaves as expected (allocates 8 cores and 1 Turing GPU), but the PySlurm code seems to ignore some of the parameters, "cpus_per_task" and "gres" in this example, and only allocates the standard 2 cores with no GPU. I've tested a few other parameters (eg. "job_name", "partition", etc.) and they appear to work correctly, so this seems limited to certain arguments.
I do not see any errors or warnings in any of the slurm logs when run with either method, and no exceptions are thrown when run through PySlurm.
After looking through the pyslurm.pyx file (and the "fill_job_desc_from_opts" function in particular), it seems like the "gres" parameter may not be a supported? However, "cpus_per_task" does appear to be supported, but still is not working.
Any help with this would be greatly appreciated. Also, if I'm right that "gres" is not supported, are there any workarounds or alternative methods for allocating GPUs to batch jobs?
Thanks,
Charlie
The text was updated successfully, but these errors were encountered: