-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
built-in bridge connection not sending or receiving messages from the cloud #2960
Comments
The event seems to have occurred again on the device under test, below are the tedge-mapper-c8y logs: Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.065543723Z INFO mqtt_channel::connection: MQTT connection established
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.087623181Z INFO c8y_api::http_proxy: JWT token requested
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.103873283Z INFO c8y_api::http_proxy: JWT token received
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.104461628Z INFO mqtt_channel::connection: MQTT connection closed
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.154844303Z INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpRAJ3RK/:etc:tedge:tedge.toml-c8y-mapper-10882415 to url: http://127.0.0.1:8001/c8y/event/events/10883500/binaries
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.528857274Z INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.530204314Z INFO mqtt_channel::connection: MQTT connection established
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.552273457Z INFO c8y_api::http_proxy: JWT token requested
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.568016047Z INFO c8y_api::http_proxy: JWT token received
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.56858441Z INFO mqtt_channel::connection: MQTT connection closed
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.61500343Z INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10882417 to location /tmp/.tmpvpwa7N/tedge-bridge-monitor-c8y-mapper-10882417
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.625846911Z INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10882417", len=0
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.646764105Z INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpvpwa7N/tedge-bridge-monitor-c8y-mapper-10882417 to url: http://127.0.0.1:8001/c8y/event/events/10880045/binaries
Jul 04 08:28:11 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:11.625840188Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Last pingreq isn't acked
Jul 04 08:28:11 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:11.625985112Z INFO tedge_mqtt_bridge: Waiting 25.97075156s until attempting reconnection to cloud broker
Jul 04 08:28:37 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:37.72420315Z INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 08:28:37 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:37.724326852Z INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jul 04 08:37:14 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:14.163169576Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Jul 04 08:37:14 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:14.163277593Z INFO tedge_mqtt_bridge: Waiting 15.633064796s until attempting reconnection to cloud broker
Jul 04 08:37:30 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:30.041670265Z INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 08:37:30 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:30.041784134Z INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jul 04 09:00:01 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:01.684107842Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: I/O: Broken pipe (os error 32)
Jul 04 09:00:01 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:01.68473139Z INFO tedge_mqtt_bridge: Waiting 44.01264048s until attempting reconnection to cloud broker
Jul 04 09:00:45 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:45.947290995Z INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 09:00:45 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:45.94742066Z INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce] And checking with the # tedge connect c8y --test
The system config file '/etc/tedge/system.toml' doesn't exist. Using '/bin/systemctl' as the default service manager
Detected mosquitto version >= 2.0.0
Sending packets to check connection. This may take up to 2 seconds.
ERROR: Local MQTT publish has timed out.
Error: failed to test connection to Cumulocity cloud.
Caused by:
Operation timed out. Is mosquitto running? |
The issue is now fixed in #2982, but I am detailing the issue here so it's easy to find the root cause if we come across a similar issue in future. In order to link messages to packet IDs so we can acknowledge the incoming message when we receive an acknowledgement for the forwarded message, we have a channel between the two bridge halves. When publishing a message, we send the thin-edge.io/crates/extensions/tedge_mqtt_bridge/src/lib.rs Lines 347 to 359 in 4d780ac
And when the other bridge half receives the thin-edge.io/crates/extensions/tedge_mqtt_bridge/src/lib.rs Lines 376 to 381 in 4d780ac
After a disconnection, however, any in-flight (i.e. unacknowledged) messages are re-published by The solution to this issue is to detect these duplicate publishes and ignore them. Since the packet ID remains the same for the original and republished message, all that is needed here is to check whether the ID is in |
There has been a second incident on the osadl-rackfslot2 device, where the mapper has stopped sending measurements and left an operation in the EXECUTING state. The incident seems to be triggered by requesting a Below are the logs collected from the device.
bridge monitor logs {"tst":"2024-07-16T02:10:30.420483+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-16T02:11:01.683226+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T16:23:54.884758+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:23:55.946158+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:19.457445+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:19.502298+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T16:24:43.985536+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:45.549973+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T18:29:34.465586+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:35.526201+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:52.113442+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:52.113589+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T18:30:05.629639+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:30:05.840028+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-18T10:43:06.028564+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-18T10:43:47.660542+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-25T09:40:09.976420+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"} tedge-mapper-c8y systemd status $ systemctl status tedge-mapper-c8y
● tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format.
Loaded: loaded (/lib/systemd/system/tedge-mapper-c8y.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/tedge-mapper-c8y.service.d
└─override.conf
Active: active (running) since Wed 2024-07-17 18:30:05 CEST; 1 week 0 days ago
Process: 1591752 ExecStartPre=/usr/bin/tedge init (code=exited, status=0/SUCCESS)
Main PID: 1591753 (tedge-mapper)
Tasks: 7 (limit: 1583)
Memory: 4.3M
CPU: 17min 901ms
CGroup: /system.slice/tedge-mapper-c8y.service
└─1591753 /usr/bin/tedge-mapper c8y The Cumulocity IoT Core's logs shows the following error log entry (
|
I've found an explanation for the last reported event where the built-in bridge was marked as down... There is a problem with the handling of QoS=0 messages where the package identifier (pkid) is 0. This results in the following problems:
The scenario can be reproduced, by executing the following command about 10 times, after 8-9 calls, the command starts to fail and it does not recover until you restart the tedge connect c8y --test The following is the system test and proposed "fix" (pending Rust review): |
Closing the ticket as the built-in mapper seems to have stabilized and has been running for 1.5 months and is still functional (on the osadl2 device, with Below is the service status showing the runtime: root@rackfslot2:~# systemctl status tedge-mapper-c8y
● tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format.
Loaded: loaded (/lib/systemd/system/tedge-mapper-c8y.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/tedge-mapper-c8y.service.d
└─override.conf
Active: active (running) since Fri 2024-09-20 15:26:46 CEST; 1 month 14 days ago
Main PID: 110720 (tedge-mapper)
Tasks: 7 (limit: 1583)
Memory: 7.9M
CPU: 1h 38min 37.309s
CGroup: /system.slice/tedge-mapper-c8y.service
└─110720 /usr/bin/tedge-mapper c8y |
Describe the bug
The tedge-mapper-c8y process stops sending data to Cumulocity IoT when using the built-in bridge.
The device where the problem was seen includes a special MQTT subscription which looks at the
tedge-mapper-bridge-c8y
service status, and the log shows that the status was toggling betweenup
anddown
, though it is currently showing asup
even though the device is not responding to operations or sending measurements.The
tedge connect c8y --test
mostly fails, but sporadically passes (e.g. it will receive a jwt every now and then), below shows both the failed connection test.Symptoms
c8y/#
topictedge-mapper-c8y
service seemed to resume normal function (e.g. operations were processed and measurements were sent)To Reproduce
It is unclear how to reproduce the bug. This section will be expanded once more information is known.
Expected behavior
The built-in bridge should reliably reconnect to the IoT Cloud, and the status of the bridge should be faithfully represented on the associated service topic.
Screenshots
Environment (please complete the following information):
Debian GNU/Linux 12 (bookworm)
Raspberry Pi 4 Model B Rev 1.5
Linux rackfslot2 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
tedge 1.1.2~129+gdc64912
journalctl -fu tedge-mapper-c8y
The text was updated successfully, but these errors were encountered: