Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocoto 1.3.7 and develop branch produce UNAVAILABLE state #107

Open
natalie-perlin opened this issue Jul 16, 2024 · 2 comments
Open

Rocoto 1.3.7 and develop branch produce UNAVAILABLE state #107

natalie-perlin opened this issue Jul 16, 2024 · 2 comments

Comments

@natalie-perlin
Copy link

natalie-perlin commented Jul 16, 2024

Successfully built 1.3.7 or latest develop branch of rocoto produce errors with state "UNAVAILABLE".
Error also include message "Slurm accounting storage is disabled"

This is run on an AWS instance with Ubuntu, SLURM version 23.02.5

Any suggestions on how this could be diagnosed?

This is run on AWS instance, Ubuntu-based
Linux ip-10-29-82-115 5.15.0-1044-aws #49~20.04.1-Ubuntu SMP Mon Aug 21 17:09:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

rocoto137error

@christopherwharrop-noaa
Copy link
Collaborator

If Slurm is configured without accounting storage enabled, then Rocoto will not be able to track status of jobs that are no longer available in squeue output. So, my guess is that you have jobs that have completed, and their status are no longer available via squeue -u $USER --federation -t all. Once that happens, it is not possible to track those jobs unless accounting storage is enabled. The message from Slurm is most likely coming from Rocoto's attempt to call sacct.

@christopherwharrop-noaa
Copy link
Collaborator

My first recommendation is to enable Slurm accounting support. If you still have issues after doing that and have a sequence of steps to reliably reproduce this problem I can investigate more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants