-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tedge agent restart on self update #2631
Fix tedge agent restart on self update #2631
Conversation
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Codecov ReportAttention:
Additional details and impacted files
|
Robot Results
|
} | ||
|
||
// On shutdown, first close input so no new messages can be pushed | ||
self.input_receiver.close_input(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query: Shouldn't the incoming_mqtt
channel also be closed to avoid reading any more messages from the broker as well? For that, these relay_xxx
methods would need to take the mutable Connection
itself, instead of the published
or received
channels from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it would be pointless to close the incoming_mqtt
channel as the underlying TCP connection to the broker is bi-directional: as we want to publish a latest batch of messages; we have to keep the connection open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closely, things could be improved when a message a received but there is no more recipient for it.
For that we need to acknowledge only messages that have been properly queued or even better processed.
This seems to be feasible now with the latest version of rumqttc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Didier Wenzek <[email protected]>
QA has thoroughly checked the bug and here are the results:
|
Proposed changes
As highlighted by flaky tests, there is a race condition when a self-update is detected by the agent. While a first thread is publishing the operation status over MQTT, a concurrent thread is triggering a shutdown of the agent, possibly before the status is actually published.
This fix introduces two mitigations:
Types of changes
Paste Link to the issue
#2623
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments