-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-manager: epilog should prevent clean
event
#5345
Comments
I wonder if #5321 (merged last week) helps here? |
I think that's a separate issue, about a specific plugin, and not epilog behavior in general. |
I did notice that the dws plugin is calling
See also: https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_21.html |
As discussed with @jameshcorbett offline, currently an outstanding As noted in the issue title, an |
As soon as a rabbit job is submitted, flux-coral2 plugins create some external resources for it. When the job reaches
cleanup
state, a flux-coral2 jobtap plugin places an epilog on the job to clean up those resources. The epilog-start initiates the cleanup, the epilog-remove indicates that the cleanup has completed. However, when the job never has resources allocated to it (because of an exception), the epilog doesn't prevent the job from moving toinactive
state, and the jobtap plugin hits an error when it tries to remove the epilog.On the exception path, the epilog isn't strictly necessary, and I could add logic to the plugins to avoid placing the epilog at all (even though the external cleanup would still be required, we could do it asynchronously). But it would be convenient to be able to handle both cases the same way.
When an exception occurs, the eventlog may look like:
whereas through the normal path it looks like
The text was updated successfully, but these errors were encountered: