-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libovsdb: give reconnects more time to process than normal transactions #2754
Conversation
7beca13
to
3a95bbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I don't follow trough the sequence of events. Trying to understand why we would need a reconnect timeout different than a connect timeout.
|
@jcaamano It's mostly about the monitor() call. That takes a context passed in by the caller, and that context is provided (in the reconnect case) by It's not that we need a reconnect timeout different than connect timeout; those two can be the same. It's that we need a reconnect timeout different than our normal OVSDB transaction timeout used for |
You know what, I did not know when the timeout starting ticking, and I guess that's when you create the context with it. I thought all along that the timeout in the context was a standard way to pass it down the line so someone could get it and start counting. Well, that's that. But then, why not do the same thing with the timeout down in line 55, for the first connect? |
3a95bbf
to
6956d9f
Compare
@jcaamano good point, updated, thanks! |
/lgtm the change itself I have my own OCD issues in general:
|
6956d9f
to
d61f8dc
Compare
Well, it's really 20s for the full operation of (a) connect TCP socket and (b) download and process the database, which could be quite large. @dave-tucker points out that if we are using update3 then (b) will be a lot shorter than I've seen. So maybe once ovn-org/libovsdb#283 is worked out and lands we can revert this change.
There's going to be an upper limit of how long it should take. The TCP connect part shouldn't be long, but the DB download and process might be. I saw 8-9s for a 25,000 pod cluster in some cases; maybe higher scales would need longer times. We can adjust as necessary, but really we just want to land ovn-org/libovsdb#283 soon. |
At larger scale a reconnect may need to download and parse a bunch of database data. This (especially the JSON parsing in cenkalti/rpc2) can take longer than we expect a normal ovsdb transaction to take, because it's a lot more data. If the reconnect takes too long our timeout will cancel the Monitor call and attempt to reconnect, perhaps timing out again, etc. Signed-off-by: Dan Williams <[email protected]>
d61f8dc
to
da9408d
Compare
At larger scale a reconnect may need to download and parse a bunch of
database data. This (especially the JSON parsing in cenkalti/rpc2) can
take longer than we expect a normal ovsdb transaction to take, because
it's a lot more data.
If the reconnect takes too long our timeout will cancel the Monitor
call and attempt to reconnect, perhaps timing out again, etc.
@jcaamano @tssurya @flavio-fernandes @trozet