-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle HTCondor "Unable to locate local daemon" Error #12172
base: master
Are you sure you want to change the base?
Conversation
…ler and StatusPoller
Jenkins results:
|
Hi @amaltaro, I have wrapped the bossAir submit call in JobSubmitterPoller and the track call in StatusPoller. Should I modify the Exception type to something specific and handle it somewhere upstream? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hassan11196 thank you for proposing this fix.
This is not a complete review, as I need to look into the exception propagation with more attention, but I think we should move all lines schedd = htcondor.Schedd()
in this module under the try/except clause as well (according to the tracebacks reported in the original issue).
With that, I believe some of these try/except that you provided are no longer relevant.
I looked at all of Let me know what you think. Thanks for looking into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hassan11196 , changes look good and reflect the discussion you had with alan in the original issue. I have a small question though :)
myThread.transaction.rollback() | ||
raise WMException(msg) from ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure i understand the need for this rollback, since it seems that if you raise a wmexception that the rollback already happens here
myThread.transaction.rollback() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mapellidario, I agree it was redundant, I have received the rollback from the submitJobs method.
Thank you for the review @mapellidario, can you provide a suggestion on how to handle the exception thrown at That is re-raised in the algorithm method? If this is not handled, the component will crash |
Jenkins results:
|
I thought about this for a while. The only way out that I see is adding a new exception, let's say you can start defining the new exception in the jobsubmitterpoller module, than in case we need it in more places we can create a new file, even a simple would do for the time being: class WMSoftException(Exception):
pass |
Fixes #9703
Status
in development
Description
This pull request adds Exception handling to catch this
Unable to locate local daemon
error thrown when a schedd instance is created i.e.schedd = htcondor.Schedd()
in theSimpleCondorPlugin.py
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
No