refactor: Improve c8y_mapper_ext operation handler API #2997

Bravo555 · 2024-07-15T16:45:02Z

TODO

~~Stop parsing the operation message payload twice. First parse it as a generic operation, and then only parse the rest of the payload~~ A bit complicated to do right now, would require Command<Payload> and GenericCommandState to be refactored, better to do it in another PR.
add a test to verify the behaviour of the handle function (refactor: Improve c8y_mapper_ext operation handler API #2997 (comment))

Proposed changes

Different operation functions themselves no longer send a clearing message, now done by the main handler
All error paths now send a proper c8y operation fail message, where previously some error paths (e.g. failure to convert path to utf-8, ChannelError when trying to send a message through a channel) would not
moved tests of some operations into these operations' modules
Generating EXECUTING and FAILED messages out to main handler.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Simplify the operation handling code #2880

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

This PR's focus is mainly removing unnecessary assumptions about MQTT API from operation functions and centralizing them in the main handler. This makes the code simpler to understand as it provides a more rigid structure and a simpler interface, and serves as a stepping stone to eventual extraction of clearing messages to the same place where we're initiating local operations. Once operation initialization and clearing messages will be in the same place, we can move the state that keeps track of which operations are running there and stop relying on MQTT broker echo of command id messages back to the mapper for it to handle it.

Also, the PR contains one new test that checks some paths in the OperationHandler::handle method. This test does not utilize sending raw MQTT messages through the actor message boxes. I plan to add more tests like this for individual operations, that more thoroughly test the possible branches.

Finally, the PR tackles clean up operation API section of #2880 (comment), but doesn't fully close the task. I'd like to handle the SUCCESSFUL messages in another PR, to keep this one relatively small.

codecov · 2024-07-15T16:59:05Z

Codecov Report

Attention: Patch coverage is 89.19192% with 107 lines in your changes missing coverage. Please review.

Project coverage is 78.2%. Comparing base (2358972) to head (d74a8c8).
Report is 10 commits behind head on main.

Additional details and impacted files

Files	Coverage Δ
...core/c8y_api/src/smartrest/smartrest_serializer.rs	`90.4% <ø> (+1.4%)`	⬆️
...extensions/c8y_mapper_ext/src/operations/upload.rs	`94.5% <100.0%> (ø)`
...ons/c8y_mapper_ext/src/operations/config_update.rs	`92.9% <73.3%> (-0.3%)`	⬇️
...s/c8y_mapper_ext/src/operations/firmware_update.rs	`90.9% <76.4%> (-0.3%)`	⬇️
.../extensions/c8y_mapper_ext/src/operations/error.rs	`82.7% <82.7%> (ø)`
...xtensions/c8y_mapper_ext/src/operations/restart.rs	`80.0% <66.6%> (+80.0%)`	⬆️
...s/c8y_mapper_ext/src/operations/config_snapshot.rs	`92.0% <64.0%> (+1.5%)`	⬆️
...ons/c8y_mapper_ext/src/operations/software_list.rs	`80.9% <83.6%> (+12.8%)`	⬆️
...nsions/c8y_mapper_ext/src/operations/log_upload.rs	`92.1% <60.7%> (+1.3%)`	⬆️
crates/extensions/c8y_mapper_ext/src/tests.rs	`92.7% <94.2%> (+<0.1%)`	⬆️
... and 2 more

... and 12 files with indirect coverage changes

github-actions · 2024-07-16T15:09:29Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
476	0	3	476	100	1h12m42.037433s

rina23q · 2024-07-16T17:04:40Z

crates/core/c8y_api/src/smartrest/smartrest_serializer.rs

@@ -82,6 +81,7 @@ pub fn succeed_operation(

 #[derive(Debug, Copy, Clone)]
 pub enum CumulocitySupportedOperations {
+    C8ySoftwareList,


c8y_SoftwareList is a very old operation replaced by c8y_SoftwareUpdate long time ago. If we need this, something is wrong (likely confusion of the inventory fragment name c8y_SoftwareList).

Ah, I didn't know that, it seemed weird that it was missing, now i know the reason :) I'll replace it with a proper usage of c8y_SoftwareUpdate

didier-wenzek · 2024-07-18T08:42:01Z

crates/extensions/c8y_mapper_ext/src/operations/mod.rs


    #[tokio::test]
-    async fn mapper_converts_config_metadata_to_supported_op_and_types_for_main_device() {


This test has been moved to tests.rs, while other tests has been moved out of tests.rs and dispatched over misc xxx.rs.

We should really avoid to do that on a refactoring PR. Reorganize either the code or the tests, not both simultaneously. This makes things harder to understand & check for little benefits.

didier-wenzek

Approved. This is a really nice simplification of the operation handlers.

crates/extensions/c8y_mapper_ext/src/operations/mod.rs

didier-wenzek · 2024-07-18T10:00:50Z

crates/extensions/c8y_mapper_ext/src/operations/software_update.rs

+                Ok(OperationResult::Finished {
+                    messages: vec![
+                        MqttMessage::new(topic, smartrest_set_operation),
+                        self.request_software_list(&target.topic_id),


Thanks to this refactoring PR, there is now an opportunity to remove from here the software list request, as command finalization in now done in one place

Unfortunately, this is not enough to remove the need for an id generator clone (currently both the converter and the running operation for the software update need to generate id). An option could be to delegate the OperationHandler the generation of operation ids.

Thanks to this refactoring PR, there is now an opportunity to remove from here the software list request, as command finalization in now done in one place

I'm not sure if removing software_list publish from software_update is practical:

Local software_update operation is published on local MQTT broker in c8y_mapper_ext/src/converter.rs:770, in function forward_software_request

This local software_update operation is picked up by the OperationHandler which runs c8y_mapper_ext::operations::software_update::OperationContext::publish_software_update_status

publish_software_update_status returns "succeed operation" message with no payload, and invokes a local software_list operation after software_update operation is succeeded or failed.

publish_software_list, when it's successful, sends the software list via HTTP proxy or smartrest.

What I'm seeing is that publish_software_list isn't an operation, and it can be invoked anytime, just after software_update software list changes, so it means it has to be updated. software_list isn't necessary to finalize the software_update operation (we send together success/failure of software_update and invocation of software_list), and software_update is the only operation that does it, so it makes little sense to extract it.

The only thing that's problematic is sharing this id generator, because generating the ids is necessary for creating local operations. Right now, creating local operations is also spread among methods in CumulocityConverter: forward_software_request, forward_restart_request, forward_operation_request, but also convert_log_upload_request, convert_config_snapshot_request etc. in operations module. They all do the same thing (converting c8y operation messages to local thin-edge operation messages), so they should be put together in a single place, either in OperationHandler, or in some other object that will have smaller scope. It would then contain the IdGenerator and provide methods to start new local operations by publishing on the local operation topics. software_update would then simply call this to create a new software_list operation instead of doing it manually.

What I'm seeing is that publish_software_list isn't an operation, and it can be invoked anytime, just after software_update software list changes, so it means it has to be updated. software_list isn't necessary to finalize the software_update operation (we send together success/failure of software_update and invocation of software_list), and software_update is the only operation that does it, so it makes little sense to extract it.

Yes, sending a software list after a software update is not strictly required by c8y and is always done at the device initiative. This is done in two cases: when the software update capability is added and after each software update request.

Technically, I see no reason to have this done inside the software update handler or elsewhere.
My concern is that one should have only one component that trigger operations on behalf of c8y.

The only thing that's problematic is sharing this id generator, because generating the ids is necessary for creating local operations.

We are fully aligned here.

Right now, creating local operations is also spread among methods in CumulocityConverter: forward_software_request, forward_restart_request, forward_operation_request, but also convert_log_upload_request, convert_config_snapshot_request etc. in operations module. They all do the same thing (converting c8y operation messages to local thin-edge operation messages), so they should be put together in a single place, either in OperationHandler, or in some other object that will have smaller scope. It would then contain the IdGenerator and provide methods to start new local operations by publishing on the local operation topics. software_update would then simply call this to create a new software_list operation instead of doing it manually.

As you say: they should be put together in a single place, either in OperationHandler or in some other object.
I'm looking for something symmetric:

the object creating an operation request should also be the object that clears that operation request when done

the OperationHandler seems to be a nice candidate for that

when clearing a software update this object could also possibly trigger a following software list request

in case you wonder: I'm not requesting you to do that in this PR.

didier-wenzek · 2024-07-19T07:31:13Z

Observed for the second time since yesterday on merge queue:

test bridge_reconnects_successfully_after_cloud_connection_interrupted ... ok
test bidirectional_forwarding_avoids_infinite_loop ... ok
test bridge_many_messages ... ok
test bridge_reconnects_successfully_after_local_connection_interrupted ... FAILED

test result: FAILED. 3 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 5.09s

albinsuresh

LGTM. I'm still not able to make up my mind on what the right way to handle channel errors emitted from these spawned tasks are. But since the current mechanism also works, approving.

crates/extensions/c8y_mapper_ext/src/operations/mod.rs

albinsuresh · 2024-07-19T09:52:06Z

crates/extensions/c8y_mapper_ext/src/operations/mod.rs

+                        mqtt_publisher
+                            .send(c8y_state_executing_message)
+                            .await
+                            .unwrap();


I believe you're doing these unwraps here because even in the case of a failure, it'll only terminate the spawned tasks and not affect the mapper itself. And these terminated tasks will be removed from running_operations when the next message for that operation type is handled.

I'm just wondering if we should have propagated these failures to the mapper, so that the runtime can shutdown the mapper with the hope of a recovery from that broken state. It may not be such a big deal, because if the sends through this publisher are failing in these tasks, then sends directly from the mapper are also likely to fail which will eventually shutdown the mapper anyway.

And these terminated tasks will be removed from running_operations when the next message for that operation type is handled.

Not next message for that operation type, but next message for that particular operation, with that particular id.

But I think there's a problem: in practice, when a panic in a child task happens, .handle() will never get called for that operation again. Operation task itself sends the messages, so if they can't be sent over MQTT because of a panic, they won't be handled, won't be joined, so we will not see that an exception has occurred.

For now, in this PR I will add a test to verify the behaviour of the handle function - when does it panic and when does it actually join the tasks.

But what I realize is that OperationHandler needs its own background task that joins the operation tasks (i.e. polls next future in FuturesUnordered), which will have to be added, or I'll have to make OperationHandler an actor, which I probably should. So far I've avoided it because I wanted OperationHandler to be a small object used and managed by the converter, not another actor, and also because our actors tend to have a lot of boilerplate, but the time for that is probably now and it's what I'll have to do in a follow-up PR.

So AFAIK right now we don't actually join the panicked tasks and don't know that a panic has occurred. This is the first thing that I should fix. Then, I'll need to make sure tasks are joined automatically so .handle() doesn't have to be called to know whether or not there's been a panic. I leave to you the decision of whether these changes should be made in a follow-up PR, or in this one.

Improved in 087e968, should clarify things for now, and could be improved in a follow-up PR (#2880 (comment)).

I'll have to make OperationHandler an actor, which I probably should

Yeah, that's pretty much aligned with the model that I had in mind too, esp after the introduction of the availability actor.

but the time for that is probably now and it's what I'll have to do in a follow-up PR.

That's fine. One step at a time.

Tests that verify the execution of the operations (i.e. where we generate an operation with a given id and verify its outputs) were moved into their own modules. As handling the operation usually consists of 2 things: 1. convert cumulocity smartrest message to local MQTT operation message and back 2. handle local MQTT operation message These steps are performed by different components, so the unit tests are also separated. Signed-off-by: Marcel Guzik <[email protected]>

This `From` impl was in c8y_api, which shouldn't know anything about tedge_api and also had a bug (C8ySoftwareUpdate => ConfigSnapshot). There was only one callsite, where we should've used `OperationType` directly anyway. Signed-off-by: Marcel Guzik <[email protected]>

Signed-off-by: Marcel Guzik <[email protected]>

More precisely, `to_response` function takes the operation result and converts it to a cumulocity response. The handlers of different operations now don't have to concern themselves with creating a cumulocity response, they can just return the error directly. Signed-off-by: Marcel Guzik <[email protected]>

1. Changed error type from `ConversionError` to `OperationError` which displays its source error in a single line 2. Added context to some operation error cases Signed-off-by: Marcel Guzik <[email protected]>

…d uploads are ongoing Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 · 2024-07-22T09:10:17Z

crates/extensions/c8y_mapper_ext/src/tests.rs

+#[tokio::test]
+async fn mapper_processes_other_operations_while_uploads_and_downloads_are_ongoing() {
+    let cfg_dir = TempTedgeDir::new();
+    let TestHandle {
+        mqtt,
+        http,
+        dl,
+        ul,
+        mut timer,
+        ..
+    } = spawn_c8y_mapper_actor(&cfg_dir, true).await;
+    let mut dl = dl.with_timeout(TEST_TIMEOUT_MS);
+    let mut ul = ul.with_timeout(TEST_TIMEOUT_MS);
+
+    spawn_dummy_c8y_http_proxy(http);
+
+    // Complete sync phase so that alarm mapping starts
+    trigger_timeout(&mut timer).await;
+
+    let mut mqtt = mqtt.with_timeout(TEST_TIMEOUT_MS);
+    skip_init_messages(&mut mqtt).await;
+
+    // simulate many successful operations that needs to be handled by the mapper
+    mqtt.send(MqttMessage::new(
+            &Topic::new_unchecked("te/device/main///cmd/log_upload/c8y-mapper-1"),
+            json!({
+            "status": "successful",
+            "tedgeUrl": "http://localhost:8888/tedge/file-transfer/test-device/log_upload/c8y-mapper-1",
+            "type": "mosquitto",
+            "dateFrom": "2023-11-28T16:33:50+01:00",
+            "dateTo": "2023-11-29T16:33:50+01:00",
+            "searchText": "ERROR",
+            "lines": 1000
+        })
+                .to_string(),
+        ))
+            .await.unwrap();
+
+    let (download_id, download_request) = dl
+        .recv()
+        .await
+        .expect("DownloadRequest for log_upload should be sent");
+    assert_eq!(download_id, "c8y-mapper-1");
+    assert_eq!(
+        download_request.url,
+        "http://localhost:8888/tedge/file-transfer/test-device/log_upload/c8y-mapper-1"
+    );
+
+    // here it would be good to assert that upload message hasn't been sent yet, but due to the
+    // behaviour of message channels it can't be easily done
+
+    dl.send((
+        "c8y-mapper-1".to_string(),
+        Ok(DownloadResponse {
+            url: "http://localhost:8888/tedge/file-transfer/test-device/log_upload/c8y-mapper-1"
+                .to_string(),
+            file_path: "whatever".into(),
+        }),
+    ))
+    .await
+    .unwrap();
+
+    let (upload_id, _) = ul
+        .recv()
+        .await
+        .expect("UploadRequest for log_upload should be sent");
+    assert_eq!(upload_id, "c8y-mapper-1");
+
+    // now that an upload is ongoing, check that downloads can also be triggered
+    mqtt.send(MqttMessage::new(
+            &Topic::new_unchecked("te/device/main///cmd/config_snapshot/c8y-mapper-2"),
+            json!({
+            "status": "successful",
+            "tedgeUrl": "http://localhost:8888/tedge/file-transfer/test-device/config_snapshot/c8y-mapper-2",
+            "type": "typeA",
+        })
+                .to_string(),
+        ))
+            .await.unwrap();
+
+    let (config_snapshot_id, _) = dl
+        .recv()
+        .await
+        .expect("DownloadRequest for config snapshot should be sent");
+    assert_eq!(config_snapshot_id, "c8y-mapper-2");
+
+    // while download and upload are ongoing, try some other operation that doesn't do download or upload
+    mqtt.send(MqttMessage::new(
+        &Topic::new_unchecked("te/device/main///cmd/restart/c8y-mapper-3"),
+        json!({
+            "status": "successful",
+        })
+        .to_string(),
+    ))
+    .await
+    .unwrap();
+
+    assert_received_contains_str(&mut mqtt, [("c8y/s/us", "503,c8y_Restart")]).await;
+}


@albinsuresh My bad, by mistake I included the test you were proposing (#2904 (comment)) in this PR, but since it's already here, you can take a look.

Looks fine.

albinsuresh

LGTM. Just some minor cosmetic comments.

crates/extensions/c8y_mapper_ext/src/operations/mod.rs

Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 had a problem deploying to Test Pull Request July 15, 2024 16:45 — with GitHub Actions Failure

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 1603106 to c0dac2d Compare July 16, 2024 09:04

Bravo555 temporarily deployed to Test Pull Request July 16, 2024 09:04 — with GitHub Actions Inactive

Bravo555 marked this pull request as ready for review July 16, 2024 14:53

Bravo555 requested review from didier-wenzek, albinsuresh, jarhodes314 and rina23q as code owners July 16, 2024 14:53

Bravo555 temporarily deployed to Test Auto July 16, 2024 14:54 — with GitHub Actions Inactive

rina23q reviewed Jul 16, 2024

View reviewed changes

Bravo555 force-pushed the improve/operation-handler-decoupling branch from c0dac2d to 08aab06 Compare July 17, 2024 12:21

Bravo555 temporarily deployed to Test Pull Request July 17, 2024 12:21 — with GitHub Actions Inactive

Bravo555 had a problem deploying to Test Auto July 17, 2024 12:27 — with GitHub Actions Failure

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 08aab06 to 5c0f3db Compare July 17, 2024 16:27

Bravo555 temporarily deployed to Test Pull Request July 17, 2024 16:27 — with GitHub Actions Inactive

Bravo555 temporarily deployed to Test Auto July 18, 2024 07:31 — with GitHub Actions Inactive

didier-wenzek reviewed Jul 18, 2024

View reviewed changes

didier-wenzek approved these changes Jul 18, 2024

View reviewed changes

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 5c0f3db to 105af44 Compare July 18, 2024 13:37

Bravo555 temporarily deployed to Test Pull Request July 18, 2024 13:37 — with GitHub Actions Inactive

Bravo555 temporarily deployed to Test Auto July 18, 2024 14:37 — with GitHub Actions Inactive

Bravo555 added this pull request to the merge queue Jul 19, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 19, 2024

didier-wenzek mentioned this pull request Jul 19, 2024

fix: Handle messages republished by MQTT bridge following a disconnection event #2982

Closed

11 tasks

albinsuresh approved these changes Jul 19, 2024

View reviewed changes

Bravo555 added 2 commits July 19, 2024 10:26

move mapper tests to crate tests module

ee70c73

add new tests for OperationHandler

9e24685

Bravo555 added 6 commits July 19, 2024 10:26

extract sending clearing message out of operation implementations

59a88ab

Signed-off-by: Marcel Guzik <[email protected]>

extract crafting of error MQTT message from operation handlers

04c3abf

1. Changed error type from `ConversionError` to `OperationError` which displays its source error in a single line 2. Added context to some operation error cases Signed-off-by: Marcel Guzik <[email protected]>

test: verify that mapper still processes operations when downloads an…

d9f1663

…d uploads are ongoing Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 105af44 to d9f1663 Compare July 19, 2024 10:26

Bravo555 had a problem deploying to Test Pull Request July 19, 2024 10:26 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request July 19, 2024 13:01 — with GitHub Actions Inactive

Bravo555 temporarily deployed to Test Auto July 19, 2024 15:19 — with GitHub Actions Inactive

reubenmiller added the refactoring Developer value label Jul 22, 2024

Bravo555 commented Jul 22, 2024

View reviewed changes

albinsuresh approved these changes Jul 23, 2024

View reviewed changes

crates/extensions/c8y_mapper_ext/src/operations/mod.rs Outdated Show resolved Hide resolved

crates/extensions/c8y_mapper_ext/src/operations/mod.rs Outdated Show resolved Hide resolved

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 087e968 to 900b6d9 Compare July 23, 2024 09:08

Bravo555 had a problem deploying to Test Pull Request July 23, 2024 09:08 — with GitHub Actions Failure

OperationHandler add doc and tests about panics

d74a8c8

Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 force-pushed the improve/operation-handler-decoupling branch from 900b6d9 to d74a8c8 Compare July 23, 2024 09:09

Bravo555 temporarily deployed to Test Pull Request July 23, 2024 09:09 — with GitHub Actions Inactive

Bravo555 temporarily deployed to Test Auto July 23, 2024 09:17 — with GitHub Actions Inactive

Bravo555 added this pull request to the merge queue Jul 23, 2024

Merged via the queue into thin-edge:main with commit d8b4ab4 Jul 23, 2024
33 checks passed

reubenmiller changed the title ~~Improve c8y_mapper_ext operation handler API~~ refactor: Improve c8y_mapper_ext operation handler API Aug 1, 2024

reubenmiller added the theme:c8y Theme: Cumulocity related topics label Aug 1, 2024

Bravo555 deleted the improve/operation-handler-decoupling branch August 1, 2024 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Improve c8y_mapper_ext operation handler API #2997

refactor: Improve c8y_mapper_ext operation handler API #2997

Bravo555 commented Jul 15, 2024 •

edited

Loading

codecov bot commented Jul 15, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading

rina23q Jul 16, 2024

Bravo555 Jul 16, 2024

didier-wenzek Jul 18, 2024

didier-wenzek left a comment

didier-wenzek Jul 18, 2024

Bravo555 Jul 18, 2024

didier-wenzek Jul 18, 2024 •

edited

Loading

didier-wenzek commented Jul 19, 2024

albinsuresh left a comment

albinsuresh Jul 19, 2024

Bravo555 Jul 19, 2024

Bravo555 Jul 19, 2024

albinsuresh Jul 23, 2024

Bravo555 Jul 22, 2024

albinsuresh Jul 23, 2024

albinsuresh left a comment


		#[tokio::test]
		async fn mapper_converts_config_metadata_to_supported_op_and_types_for_main_device() {

refactor: Improve c8y_mapper_ext operation handler API #2997

refactor: Improve c8y_mapper_ext operation handler API #2997

Conversation

Bravo555 commented Jul 15, 2024 • edited Loading

TODO

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

codecov bot commented Jul 15, 2024 • edited Loading

Codecov Report

github-actions bot commented Jul 16, 2024 • edited Loading

Robot Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

didier-wenzek commented Jul 19, 2024

albinsuresh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albinsuresh left a comment

Choose a reason for hiding this comment

Bravo555 commented Jul 15, 2024 •

edited

Loading

codecov bot commented Jul 15, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading

didier-wenzek Jul 18, 2024 •

edited

Loading