-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to achieve a reconnect delay in the client? #918
Comments
This is even more problematic for production systems where the connection to the broker is over a low-latency LAN. You can flood the network with connection requests as the client requests a connection but the server immediately rejects them. It the case where the destination host is unreachable or does not respond, the OS itself implements a retry w/ back-off on the initial TCP SYN, but in the case of an active rejection, it's always required that user code implement a retry delay for client protocols. If a PR would be considered to implement this behavior in the library, we'd consider implementing it. Just want to have buy-in from the maintainers that this is a valid use case to handle in the library itself. The way |
That wasn't the intention of the comment. We merely wanted to convey that while connected the
Thanks for the interest, but we feel it is a better practice to implement this outside of the library and keep it clean with regards to the MQTT standards. We would love to see examples though, and are willing to accept documentation improvements if that'd be something you could help us with! |
@de-sh Can you point me where in the MQTT standard it says the client must or must not implement a reconnect delay? This is functionality that every user developing a real-world system needs to implement MQTT robustly. An option could be defaulted to 0 (no delay) if you feel that's the default best behavior. |
There might be a misunderstanding here, I was also point out the same in my message. We should implement this outside of the library, but an example would be a starting point for anyone interested to get started. |
Thanks for clarifying @de-sh, I understand what you're saying now. We'll produce an example of what it takes to implement this outside the library currently, an we can continue the discussion from there as to whether it belongs inside the library or not. |
Here is how I implemented exponential backoff in my project using the backoff library: pub async fn async_event_loop_listener(
(eventloop, incoming_event_sender): (EventLoop, Sender<Message>),
) -> Result<(), error::Error> {
let eventloop = Arc::new(Mutex::new(eventloop));
loop {
match poll(Arc::clone(&eventloop)).await {
Ok(event) => {
if let Err(err) = handle_event(event, &incoming_event_sender) {
error!("Error while handling MQTT event: {err}");
}
}
Err(err) => {
error!("Fatal MQTT connection error: {err}");
return Err(err.into());
}
};
}
}
async fn poll(eventloop: Arc<Mutex<EventLoop>>) -> Result<Event, ConnectionError> {
retry_notify(
ExponentialBackoff::default(),
|| async { eventloop.lock().await.poll().await.map_err(backoff_error) },
|err, dur: Duration| {
let dur = dur.as_secs_f32();
error!("Error while polling MQTT event loop: {err}\n -> Retrying in {dur:.1}s...");
},
)
.await
}
fn backoff_error(err: ConnectionError) -> backoff::Error<ConnectionError> {
match err {
ConnectionError::ConnectionRefused(
ConnectReturnCode::ProtocolError
| ConnectReturnCode::UnsupportedProtocolVersion
| ConnectReturnCode::ClientIdentifierNotValid
| ConnectReturnCode::BadUserNamePassword
| ConnectReturnCode::NotAuthorized
| ConnectReturnCode::Banned
| ConnectReturnCode::BadAuthenticationMethod
| ConnectReturnCode::UseAnotherServer
| ConnectReturnCode::ServerMoved,
) => backoff::Error::permanent(err),
_ => backoff::Error::transient(err),
}
} |
The client currently appears to have no delay when attempting to reconnect to a broker. This can be especially problematic when the broker is running locally, since the failure happens very quickly, so you get a very fast rate of reconnect attempts.
The eventLoop poll() method warns "Don't block this while iterating." Does that mean it's always harmful to wait before re-polling, even immediately after a disconnect event is returned?
Would it be possible for the client to implement at least a reconnect delay, if not an exponential backoff, that would work even in cases where eventLoop.poll() is always called with no delay?
The text was updated successfully, but these errors were encountered: