-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an exception handler for api client #72
Conversation
not sure this implementation avoid the connection error when the server is restarted, did you tested ? |
as the Nomad API uses OkHttpApi maybe we can evaluate how to add an interceptor and implement a retry logic |
Agreed there are multiple options in the HTTP client which we can possibly expose. Regarding testing the PR, will check on local cluster when I'm back at my desk, but I'm starting to think that we should not use that |
plugins/nf-nomad/src/main/nextflow/nomad/executor/NomadService.groovy
Outdated
Show resolved
Hide resolved
So I did the experiment with branch and this is what I experienced. Different failure compared to #71 Experimental setup:
With the current
|
Signed-off-by: Jorge Aguilera <[email protected]>
39368de
to
85540b9
Compare
I've refactored our work and now we are using FailSafe approach still need to test a little more (trying to stop the cluster and so on) but it looks nice |
I want to implement a more robust test but stopping/restarting manually the nomad process during a bactopia pipeline (because it takes more time to complete than a simple hello) seems to work:
cc @matthdsm
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jorge! To me this looks good to merge ✅
I have just added a minor comment regarding the exit codes we're relying upon.
config.jobOpts().region, | ||
config.jobOpts().namespace, | ||
null, null, null) | ||
safeExecutor.apply { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this clean design @jagedn 🤩
.build() | ||
} | ||
|
||
final private static List<Integer> RETRY_CODES = List.of(408, 429, 500, 502, 503, 504) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jagedn, is there a specific dictionary/reference you used to focus only on these exit codes?
Maybe a good idea to add that in the comments or just explain the through process behind these exit codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahaha c&p from the azure plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think so, they are common http error codes
I've added descriptions for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, I was under the impression that nomad would have it's own set of error codes as well but judging from this issue hashicorp/nomad#17782 I think that for client/task errors it stores them in exit code
Could be related to #77 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree but maybe better handle the ext code in the #77 and let this PR handle the infra/http errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - then let's merge and make an edge release :)
Signed-off-by: Jorge Aguilera <[email protected]>
This PR