-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry the grpc connection when there's an error #503
Conversation
@mattklein123 @renuka-fernando Requesting your review on this PR. Thank you. |
Signed-off-by: alekhya.kondapuram <[email protected]>
Signed-off-by: alekhya.kondapuram <[email protected]>
Signed-off-by: alekhya.kondapuram <[email protected]>
p.retryGrpcConn() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we retry gRPC conn only for connection errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we run Xds Server behind Envoy, during pod shutdowns/server enforced max connection age, the client gets RESET frame like "rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: NO_ERROR" which is not treated as connection failure (It is RESET). Envoy -> XDS control plane does not just retry connection failures but retries with a backup on any error for the same reason https://github.com/envoyproxy/envoy/blob/49425f55aa9212a64b3390909160c41dc22ff349/source/extensions/config_subscription/grpc/grpc_stream.h#L50
This PR just mimicks the Envoy behaviour
Thank you for the approval @renuka-fernando |
@mattklein123 can you please take a look and merge this if this looks good to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add documentation in the README.
Signed-off-by: alekhya.kondapuram <[email protected]>
Thank you for your review @mattklein123. Updated per comments. Please take a second look when you're free. Thanks! |
Signed-off-by: alekhya.kondapuram <[email protected]>
This PR aims to fix the hot-looping problem described in Issue#502
The issue here was when the xDS-server closed the stream, the xDS-client tried to NACK the previous response and it went berserk in a hot-loop trying to fetch the configuration updates from the closed stream. This happens because the sotw.isConnErr doesn't return true in this case when the server signals the client with the following error message
rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: NO_ERROR
.Here's the sotw.isConnErr() for reference.
So the idea of this PR is to retry the connection whenever there is an error trying to fetch the config, instead of just expecting and handling just a few error codes.
Also, added exponential backoff for retrying the connection attempts.