-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inactivity probe for ovsdb connection #356
Conversation
5fb5878
to
4430c9c
Compare
4430c9c
to
a6eac78
Compare
/assign @dcbw @martinkennelly @tssurya |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial feedback. to be continued
084c48b
to
e7f7736
Compare
/assign @jcaamano |
See my question here: ovn-kubernetes/ovn-kubernetes#3578 (comment) -> maybe @jcaamano or @dcbw might know:
|
client/options.go
Outdated
// The timeout argument is used for constructing the context for sending | ||
// each Echo and Reconnect requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm
Should WithInactivityCheck
require that WithReconnect
is also set?
And then for the reconnect we use the timeout value specified in WithReconnect
.
And for the echo timeout we use a reasonable hard coded value? Or optionally the value provided here.
I am not sure the reconnect timeout should be the same as the echo timeout. Specially if the reconnect timeout has to consider a possible cache reconciliation.
Thoughts here @dcbw ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcaamano I think inactivity check will likely always be paired with reconnect, but that's not technically required. Note that OVS' jsonrpc library (which is what ovn-controller/northd/etc all end up calling) does not tie the two together:
/* Sets the "probe interval" for 's' to 'probe_interval', in milliseconds. If
* this is zero, it disables the connection keepalive feature. Otherwise, if
* 's' is idle for 'probe_interval' milliseconds then 's' will send an echo
* request and, if no reply is received within an additional 'probe_interval'
* milliseconds, close the connection (then reconnect, if that feature is
* enabled). */
void
jsonrpc_session_set_probe_interval(struct jsonrpc_session *s,
int probe_interval)
The echo timeout should be caller-controlled, not with a hardcoded value, and probably should be different than the reconnect timeout. They're timing two different things; reconnect is for the connect() call timeout, while the echo/idle timer is during the conversation. I think it's reasonable to set reconnect timeout low, like 20 seconds. But the idle timer should be higher; we set ovn-controller to 180 seconds for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7d0abec
to
078c247
Compare
@pperiyasamy so I'm wondering if we can make this simpler; most of the logic and variables can go into First though, the inactivity probe should only be triggered after there is no traffic from the server within the given inactivity interval. You could create a
Then whenever we get a server reply, really whenever we get a successful reply from
Now for the second part...
This last select source would be the worker bit; it could first check the "last echo timestamp" value; if that != "" we know we've waited an inactivity interval without a reply from the server, thus we can call o.Disconnect() and return. Next it would send an echo request to the server, and set the arguments to something like Then we'd need to spawn another goroutine to wait for and handle the reply, because otherwise we'd block the select{} and fail to catch the inactivity interval timeout if the server never replied. This goroutine would have another select{} that waits for Something like:
Anyway, this approach would keep everything pretty centralized in What do you think? |
@dcbw I think this is more optimal solution for inactivity probing because of |
078c247
to
3a486e4
Compare
This enhances ovsdb client to sends echo request periodically at specified interval to ovsdb server, when echo request fails consecutively 2 * interval then consider connection as inactive and then attempts to reconnect with server. This ensures early detection of ovsdb server connectivity issues and reconnect to the server immediately when it's back to operational. Signed-off-by: Periyasamy Palanisamy <[email protected]>
This commit adds necessary changes into server for tweaking echo handler to throw error and add an unit test for aliveness of server connection. Signed-off-by: Periyasamy Palanisamy <[email protected]>
This commit needed to fix issues in inactivity check handling logic when the test is enhanced to have toggling echo reply multiple times on the ovsdb server. Signed-off-by: Periyasamy Palanisamy <[email protected]>
This commit changes to use a consistent naming in options and client module for inactivity related variables and methods. It also fixes a deadlock scenario between rpcMutex and inactivityCheckStopped ch when shutdown and echo request happening at the same time. Signed-off-by: Periyasamy Palanisamy <[email protected]>
This reuses reconnect code associated with handling disconnect notification for the echo failure, It makes code more readble and avoids unneccessary use of additional mutexes and flags. It also avoids sending echo unnecessarily when ovsdb traffic going on with server connection and that's being for connection aliveness check which saves some cpu cycles and echo traffic on the wire. Signed-off-by: Periyasamy Palanisamy <[email protected]>
3a486e4
to
b833418
Compare
This fixes field type test on a ovsdb table. Signed-off-by: Periyasamy Palanisamy <[email protected]>
LGTM |
This enhances ovsdb client to sends echo request periodically at specified interval to ovsdb server, when echo request fails consecutively 2 * interval then consider connection as inactive and then attempts to reconnect with server. This ensures early detection of ovsdb server connectivity issues and reconnect to the server immediately when it's back to operational.