Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle messages republished by MQTT bridge following a disconnection event #2982

Closed
wants to merge 0 commits into from

Conversation

jarhodes314
Copy link
Contributor

@jarhodes314 jarhodes314 commented Jul 8, 2024

Proposed changes

The OSADL device running the built in bridge was occasionally disconnecting from cloud while sending messages and then failing to recover upon reconnection. The root cause was that unacknowledged messages were being republished by rumqttc upon reconnection, but the code was always waiting for a corresponding event from the other half of the bridge (as it implicitly assumed each message was only published once). Since there was no such corresponding event, this caused the bridge to block.

This PR ignores the duplicate publishes in the section of code that was blocking to fix this bug.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link
Contributor

github-actions bot commented Jul 8, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
471 5 3 476 98.95 1h14m53.808472s

Failed Tests

Name Message ⏱️ Duration Suite
Successful shell command with output Expected operation (id=11365533) to be SUCCESSFUL, but got: EXECUTING (failureReason: ) 32.830 s Shell Operation Built-In Bridge
Check Successful shell command with literal double quotes output Expected operation (id=11365596) to be SUCCESSFUL, but got: EXECUTING (failureReason: ) 32.580 s Shell Operation Built-In Bridge
Execute multiline shell command Expected operation (id=11365658) to be SUCCESSFUL, but got: EXECUTING (failureReason: ) 32.790 s Shell Operation Built-In Bridge
Failed shell command Expected operation (id=11365753) to be FAILED, but got: EXECUTING 33.063 s Shell Operation Built-In Bridge
Send alarms to an unregistered service Expected total device services count to be greater than or equal to 1 32.783 s Thin-Edge Device Telemetry Built-In Bridge

@reubenmiller reubenmiller added the theme:mqtt Theme: mqtt and mosquitto related topics label Jul 12, 2024
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request July 17, 2024 07:22 — with GitHub Actions Inactive
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 87.36842% with 36 lines in your changes missing coverage. Please review.

Project coverage is 78.3%. Comparing base (ec95cf0) to head (c34ab66).
Report is 34 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
...edge_config/src/tedge_config_cli/models/seconds.rs 100.0% <ø> (+1.2%) ⬆️
crates/extensions/tedge_mqtt_bridge/src/health.rs 96.7% <50.0%> (+1.7%) ⬆️
crates/extensions/tedge_mqtt_bridge/src/lib.rs 88.2% <87.6%> (-1.2%) ⬇️

... and 36 files with indirect coverage changes

@jarhodes314 jarhodes314 temporarily deployed to Test Pull Request July 17, 2024 14:08 — with GitHub Actions Inactive
@jarhodes314 jarhodes314 changed the title Add debug logs for built-in bridge to run on OSADL device fix: Handle messages republished by MQTT bridge following a disconnection event Jul 17, 2024
@jarhodes314 jarhodes314 temporarily deployed to Test Pull Request July 17, 2024 16:16 — with GitHub Actions Inactive
@jarhodes314 jarhodes314 temporarily deployed to Test Pull Request July 18, 2024 08:47 — with GitHub Actions Inactive
@jarhodes314 jarhodes314 marked this pull request as ready for review July 18, 2024 09:15
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@jarhodes314 jarhodes314 added this pull request to the merge queue Jul 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 18, 2024
@didier-wenzek
Copy link
Contributor

didier-wenzek commented Jul 18, 2024

The merge queue rejected this PR because the following relevant unit test is failing

test bridge_reconnects_successfully_after_local_connection_interrupted ... FAILED

See also: #2997 (comment)

@jarhodes314 jarhodes314 added this pull request to the merge queue Jul 23, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 23, 2024
@jarhodes314 jarhodes314 temporarily deployed to Test Pull Request July 25, 2024 15:16 — with GitHub Actions Inactive
@jarhodes314 jarhodes314 temporarily deployed to Test Pull Request July 25, 2024 15:21 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:mqtt Theme: mqtt and mosquitto related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants