-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash at "start job" step #2
Comments
Thank you for your feedback! You need to provide the correct partition name for your cluster. This must be specified in the The How to find the name of the default partitionIf you can connect to the cluster using a terminal you can run the command:
The suffix "*" identifies the default partition. Copy this name, without the asterisk ( Alternatively, you may need to look for the default partition in the documentation of your cluster. Unfortunately, the HPC Workflow Manager does not find the actual partition name used by default and must be set by the user. You may also want or need to use a different partition depending on your needs or use for different jobs. |
Thanks for the fast and detailed answer! If it's of any use, there is a bash script created on the cluster side that I can't link here (not supported file type). Thanks! |
Hello, I'm trying to follow your guide and failed with an error at the start step
My local machine is a Linux Mint 20 and I'm trying to use our University's cluster. Up until the start point everything goes according to your documentation, but when I click on start job (with the user.ijm macro), this error appears:
crash_at_job_start.txt
As far as I understand, some batch partition is invalid, so I've been trying to find where it is being called but couldn't figure it out. I've reached out to our IT department but they think the issue is that the job script is not correctly created and fails, so they can't see anything from the slurm side as it does not accept the job.
Could you help me figure out how I could debug the job submission?
Please let me know if I can add something more.
Thanks!
Sebastien
The text was updated successfully, but these errors were encountered: