Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADDED] Exponential backoff for server reconnects #1224

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

piotrpio
Copy link
Collaborator

No description provided.

@coveralls
Copy link

Coverage Status

Coverage: 85.14% (-0.04%) from 85.176% when pulling 20fb507 on reconnect-backoff into 7917595 on main.

@piotrpio piotrpio marked this pull request as ready for review February 28, 2023 16:27
// Base reconnect wait is 10ms, with x2 multiplier. Max wait time is 2 minutes.
// 10ms, 20ms, 40ms, 80ms...2m
// A random jitter is added to the result.
func DefaultReconnectBackoffHandler(jitter time.Duration) ReconnectDelayHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this feature, I am wondering if some of the other options we have on the clients should be deprecated or go away (at the very least, this should be the default)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is default with this PR (I removed having ReconnectWait as default). I'm not sure we need to deprecate ReconnectWait, it's still a valid solution if user want's to have constant wait time between reconnects.

@piotrpio
Copy link
Collaborator Author

piotrpio commented Mar 1, 2023

@aricart I am wondering about a couple of things:

  1. What should we use as default base/max/multiplier values for reconnect attempts? It should be consistent across the clients, with ReconnectWait go and javascript had 2 seconds, which seems to be a bit much. Rust already has backoff with 1ms as the first wait time, which seems a bit low :) In this PR I set base 10ms, multiplier x2 with max 2 minutes, it seems reasonable to me but I wonder what's your take on this.
  2. Should we get rid of setting MaxReconnects by default? I would assume so, to basically try endlessly with some max interval.

@@ -50,8 +50,7 @@ const (
Version = "1.24.0"
DefaultURL = "nats://127.0.0.1:4222"
DefaultPort = 4222
DefaultMaxReconnect = 60
DefaultReconnectWait = 2 * time.Second
DefaultMaxReconnect = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to change into a span of time but not sure about changing this var as the default since has been like that from the start... maybe as an option to test out new behavior like MaxTotalReconnectTime(2*time.Minute)? We also need to be able to set max reconnect to -1 so that it means that we want the client to reconnect forever and never give up reconnecting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so on the websocket client under firefox we get huge issue, because aside from the fact that they don't honor socket requests to cancel etc, if the client doesn't somewhat line up, all sorts of crazy stuff happens - Firefox is the only one to implement that, but at any rate here's the description

https://www.rfc-editor.org/rfc/rfc6455#section-7.2.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piotrpio's change is right in line with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with values as we have them, with possibly adding a new option "ExponentialBackoff" which would be or the start backoff - understanding that it doubles every try and then at 10 increases, it resets - The trick is that at 2s start, that means 17m interval on the 10th try which is possibly way too long. Possibly it is right to say the first retry is like 25ms or so... - That would set the upper bound near 1m if I have my maths right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants