Move `config_update` file download from tedge-mapper-c8y to tedge-agent #2511

Bravo555 · 2023-12-08T12:58:33Z

TODO

addressing feedback for unit tests

Proposed changes

Motivation

This PR moves file download from cloud - which needs to happen when handling config_update operation - from the mapper to the tedge-agent, in order to simplify the mapper. However, the complexity is not eliminated and is merely moved into tedge-agent, which now needs to have special handling for Cumulocity. However, this is accepted for now, because it enables tedge-agent and tedge-mapper-c8y to be run on different containers, and eliminates the implicit dependency on File Transfer Service (a part of tedge-agent) in tedge-mapper-c8y.

Summary

The updated operation works like this:

The Download configuration type with type Smartrest message is received by the mapper
tedge-mapper-c8y converts the smartrest message into the following MQTT message:
```
topic: te/device/device01///cmd/config_update/1234
payload: {"status": "init", "remoteUrl": "https://example.org/file", "configType": "type"}
```
i.e. tedgeUrl property was made optional and in the initial message it isn't set.
tedge-configuration-manager (part of tedge-agent) of the given entity, if the config type is supported, marks the operation as executing and waits for the operation to be updated with tedgeUrl
tedge-agent on the main device, upon seing that config_update operation is being executed, but does not have tedgeUrl, downloads the file in remoteUrl, saves it into FTS, and updates the operation with a valid tedgeUrl pointing into FTS
tedge-configuration-manager of the given entity, upon seeing that tedgeUrl is now present, downloads the file from it and continues processing the operation
The workflow continues unchanged from this point.

Test changes

To test the changes, configuration_with_file_transfer_https.robot was modified to start tedge-agent on a child device, and tedge-agent was removed from the main device.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Decouple tedge-mapper-c8y and tedge-agent for config_update #2477

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

The current implementation of caching files which are further needed by child devices, is very ad-hoc and as said in #2071 (comment), can be improved by custom workflow. This is something we'll need to think more about, but for now, it's enough to remove the implicit dependency on the FTS in tedge-mapper-c8y.

codecov · 2023-12-08T13:08:07Z

Codecov Report

Merging #2511 (0cce0b3) into main (04db825) will increase coverage by 0.0%.
The diff coverage is 13.9%.

Additional details and impacted files

Files	Coverage Δ
.../tedge_config/src/tedge_config_cli/tedge_config.rs	`80.8% <100.0%> (ø)`
crates/core/tedge_agent/src/lib.rs	`0.0% <ø> (ø)`
crates/core/tedge_api/src/messages.rs	`83.4% <ø> (ø)`
crates/extensions/c8y_mapper_ext/src/converter.rs	`82.4% <100.0%> (-0.1%)`	⬇️
crates/core/c8y_api/src/http_proxy.rs	`75.1% <0.0%> (ø)`
crates/extensions/c8y_auth_proxy/src/url.rs	`79.7% <0.0%> (-1.2%)`	⬇️
crates/extensions/c8y_mapper_ext/src/actor.rs	`78.1% <83.3%> (+3.1%)`	⬆️
...ons/c8y_mapper_ext/src/operations/config_update.rs	`91.3% <92.3%> (+43.7%)`	⬆️
...rates/extensions/tedge_config_manager/src/actor.rs	`63.0% <40.0%> (-0.5%)`	⬇️
crates/core/tedge_agent/src/agent.rs	`0.0% <0.0%> (ø)`
... and 1 more

... and 1 file with indirect coverage changes

github-actions · 2023-12-08T13:16:17Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
373	0	3	373	100	58m33.902999999s

rina23q

The solution to the issue looks good.

The major problems are

Removing the access to symlinks in c8y-mapper is forgotten in both code and tests.
Splitting config_operations.rs to two files is nice, but it's a pain to review since you made some changes on config_update.rs.

Also some minors.

crates/common/tedge_config/src/tedge_config_cli/tedge_config.rs

crates/extensions/c8y_mapper_ext/src/converter.rs

crates/extensions/tedge_config_manager/src/actor.rs

crates/extensions/c8y_mapper_ext/src/operations/mod.rs

crates/extensions/c8y_mapper_ext/src/operations/config_update.rs

crates/core/tedge_agent/src/agent.rs

crates/core/tedge_agent/src/operation_file_cache/mod.rs

didier-wenzek · 2023-12-12T14:57:03Z

Splitting config_operations.rs to two files is nice, but it's a pain to review since you made some changes on config_update.rs.

@Bravo555 Can you do the following:

Create a new PR, where you introduce only these changes:
1. Introduce this new operations module
2. git mv there log_update.rs, firmware_update.rs, config_operations.rs
3. split config_operations.rs into config_snapshot.rs and config_update.rs.
This reorg PR should be reviewed and merged quickly.
Rebase this PR on top of the reorg PR.
@rina23q rebase Get operation from JSON over MQTT instead of SmartREST #2482 on top of this same reorg PR.

This will ease reviewing this PR and merging it with Rina's work.

Bravo555 · 2023-12-13T09:07:36Z

Made a separate reorg PR:

#2517

jarhodes314

I've had a decent look at the agent side, I haven't had a particularly close look at the mapper changes yet

crates/core/tedge_agent/src/agent.rs

crates/core/tedge_agent/src/operation_file_cache/mod.rs

crates/extensions/tedge_config_manager/src/actor.rs

jarhodes314 · 2023-12-13T11:15:26Z

crates/core/tedge_agent/src/operation_file_cache/mod.rs

+            return Ok(());
+        };
+
+        if update_payload.remote_url.is_empty() || update_payload.tedge_url.is_some() {


I'm hypothesising a lot here, but it feels here like there is some logic that decides from the payload what state we're in that's distributed around this actor's methods. Can we do some sort of conversion from ConfigUpdateCmdPayload into an enum that represents what state we're currently in so we process that information in a single place?

I may be missing what you mean slightly, but IMO this is much more the case in other operation handlers, which have more states, here we only: 1. process messages which have a remoteUrl and don't have tedgeUrl, 2. download from the remote URL, put in the file-transfer directory, and put the URL inside tedgeUrl. I agree that transitions between states in operation handling is hardly visible, but I thought about improving that in all the operations later, but here I'm not sure if you mean something simpler than that.

I hadn't deeply considered a solution, it just "felt like" it could do with being more "parse, don't validate", but I'm happy to be overruled here.

My main concern is that the logic to process a config update request is scattered in so many places that it's really difficult to be sure that this is working as expected and really easy to break.

Instead of what we have today (some dumb payloads in tedge_api and processing logic in c8y_mapper_ext, tedge_config_ext and now tedge_agent)

I would prefer a central place where are defined the misc steps required to process a configuration update and to reduce the role of the mapper, agent, and plugins to trigger actions.

This is how I understand @jarhodes314 call for "parse, don't validate":

impl FileCacheActor { async fn process_mqtt_message( &mut self, mqtt_message: MqttMessage, ) -> Result<(), RuntimeError> { match ConfigUpdate::action_for(mqtt_message) { Some(DownloadRemote { cloud_url, }) => { ... } } }

Unfortunately, this is not the best time to fix that.

crates/core/tedge_agent/Cargo.toml

crates/core/tedge_agent/src/agent.rs

didier-wenzek · 2023-12-14T09:20:39Z

crates/core/tedge_agent/src/operation_file_cache/mod.rs

+            Ok(download) => download,
+        };
+
+        self.create_symlink_for_config_update(


What's the purpose of this symlink?

I'm not sure if I understand the question. It should be the same as before: we download the file to the data dir cache directory, and then create the symlink to this file in the file transfer directory. After the operation is complete, we delete the symlink.

Ok thanks. I though this has been introduced by this PR.

@rina23q do you know the answer?

The file-transfer service only allows you to share files under the /var/tedge/file-transfer directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache) which doesn't fall under the file-transfer dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer so that the same path can be shared with all clients.

What @Bravo555 said is totally correct.

Creating a symlink from cache is more about the security. Since the files under /var/tedge/file-transfer can be accessible from all other thin-edge components, that means they can modify the contents by using HTTP requests. Keeping /var/tedge/cache out of the file transfer directory ensures the files cannot change by any HTTP requests.

Keeping /var/tedge/cache out of the file transfer directory ensures the files cannot change by any HTTP requests.

Good point @rina23q

albinsuresh · 2023-12-15T07:21:49Z

crates/extensions/tedge_config_manager/src/actor.rs

-        let download_request = DownloadRequest::new(&request.tedge_url, temp_path.as_std_path())
+        let Some(tedge_url) = &request.tedge_url else {
+            debug!("tedge_url not present in config update payload, ignoring");
+            return Ok(());


I guess it's too late now, but it feels like properly using the scheduled state (the agent updating the state from init to schedule after the download, and then this config manager actor reacting only to that scheduled state) would have been clearer, in terms of the operation control flow, IMO. Though it works, reacting to the executing state differently at different times looks a bit convoluted.

But arguably, downloading the config the file is a normal part of execution of the config_file operation, and most of the actual time of the operation execution is spent there, so I don't think downloading in init or schedule state would be better. For other complex operations there can be multiple steps for which we don't have enough distinct states. I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.

Yeah, indeed too late to fix that, but it would have been better to introduce a specific state - say downloading.

I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.

I expect the same ;-)

However, the current design choice will make harder the integration with operation workflows as the latter assume that the action to handle a command in a given state can be derived only from the status (e.g. "downloading") and not after some constraints on the request payload (e.g. "there is a remote url but not local url").

crates/core/tedge_agent/src/operation_file_cache/mod.rs

albinsuresh · 2023-12-15T07:53:11Z

crates/core/tedge_agent/src/operation_file_cache/mod.rs

+            Ok(download) => download,
+        };
+
+        self.create_symlink_for_config_update(


The file-transfer service only allows you to share files under the /var/tedge/file-transfer directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache) which doesn't fall under the file-transfer dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer so that the same path can be shared with all clients.

Bravo555 · 2023-12-15T09:02:53Z

It seems that the integration tests are very flaky, also failing on the main branch:

https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507

Could it be due to recent changes to registration message handling?

reubenmiller · 2023-12-15T09:45:24Z

It seems that the integration tests are very flaky, also failing on the main branch:

https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507

Could it be due to recent changes to registration message handling?

Quickly looking at the failed tests, I think it might be due to the workflow stuff (#2496)

didier-wenzek · 2023-12-15T10:39:58Z

It seems that the integration tests are very flaky, also failing on the main branch:
https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507
Could it be due to recent changes to registration message handling?

Quickly looking at the failed tests, I think it might be due to the workflow stuff (#2496)

So indeed related to checking that a restart can be triggered using a workflow.

The test is red but for bad reason as the outcome is as expected (except that the message is doubled).

Matching messages is greater than maximum. wanted: 1 got: 2 messages: ['{"old-tedge-agent-pid":"MainPID=100","status":"tedge-agent-restarted","tedge-agent-pid":"MainPID=467"}', '{"old-tedge-agent-pid":"MainPID=100","status":"tedge-agent-restarted","tedge-agent-pid":"MainPID=467"}']

Here is a PR to make this test more robust : #2530

crates/extensions/c8y_mapper_ext/src/operations/config_update.rs

crates/core/tedge_agent/src/operation_file_cache/mod.rs

rina23q

LGTM

My only concern is #2511 (comment). But it's not for this PR.

didier-wenzek

Approved.

albinsuresh

LGTM

We can improve this later with explicit downloading and downloaded states as proposed in the comments.

Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 had a problem deploying to Test Pull Request December 8, 2023 13:06 — with GitHub Actions Failure

Bravo555 force-pushed the improve/2477/decouple-config-update branch from db44ab4 to b0bf4c7 Compare December 8, 2023 13:09

Bravo555 temporarily deployed to Test Pull Request December 8, 2023 13:16 — with GitHub Actions Inactive

Bravo555 force-pushed the improve/2477/decouple-config-update branch from b0bf4c7 to e8223c3 Compare December 12, 2023 07:51

Bravo555 temporarily deployed to Test Pull Request December 12, 2023 07:58 — with GitHub Actions Inactive

Bravo555 marked this pull request as ready for review December 12, 2023 09:28

Bravo555 marked this pull request as draft December 12, 2023 09:29

Bravo555 force-pushed the improve/2477/decouple-config-update branch from e8223c3 to f8a4088 Compare December 12, 2023 09:57

Bravo555 marked this pull request as ready for review December 12, 2023 09:57

Bravo555 requested review from didier-wenzek, rina23q and jarhodes314 December 12, 2023 09:57

Bravo555 temporarily deployed to Test Pull Request December 12, 2023 10:03 — with GitHub Actions Inactive

rina23q reviewed Dec 12, 2023

View reviewed changes

jarhodes314 reviewed Dec 13, 2023

View reviewed changes

Bravo555 force-pushed the improve/2477/decouple-config-update branch from f8a4088 to 540a18f Compare December 13, 2023 14:53

Bravo555 had a problem deploying to Test Pull Request December 13, 2023 15:01 — with GitHub Actions Failure

Bravo555 force-pushed the improve/2477/decouple-config-update branch from 540a18f to 6276357 Compare December 14, 2023 09:35

didier-wenzek reviewed Dec 14, 2023

View reviewed changes

Bravo555 had a problem deploying to Test Pull Request December 14, 2023 09:42 — with GitHub Actions Failure

Bravo555 force-pushed the improve/2477/decouple-config-update branch from 6276357 to db98af7 Compare December 14, 2023 10:58

Bravo555 had a problem deploying to Test Pull Request December 14, 2023 11:06 — with GitHub Actions Failure

Bravo555 had a problem deploying to Test Pull Request December 14, 2023 14:08 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request December 14, 2023 15:28 — with GitHub Actions Inactive

Bravo555 requested review from rina23q and didier-wenzek December 14, 2023 16:30

Bravo555 requested a review from jarhodes314 December 14, 2023 16:30

Bravo555 had a problem deploying to Test Pull Request December 14, 2023 16:42 — with GitHub Actions Failure

albinsuresh reviewed Dec 15, 2023

View reviewed changes

Bravo555 temporarily deployed to Test Pull Request December 15, 2023 09:01 — with GitHub Actions Inactive

Bravo555 force-pushed the improve/2477/decouple-config-update branch from 595dd9b to 08944c0 Compare December 15, 2023 09:34

Bravo555 temporarily deployed to Test Pull Request December 15, 2023 09:41 — with GitHub Actions Inactive

rina23q reviewed Dec 15, 2023

View reviewed changes

crates/extensions/c8y_mapper_ext/src/operations/config_update.rs Show resolved Hide resolved

rina23q reviewed Dec 15, 2023

View reviewed changes

crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved

rina23q approved these changes Dec 15, 2023

View reviewed changes

didier-wenzek approved these changes Dec 15, 2023

View reviewed changes

albinsuresh approved these changes Dec 15, 2023

View reviewed changes

Bravo555 added 2 commits December 15, 2023 14:04

Add operation file cache to agent

dfca332

Signed-off-by: Marcel Guzik <[email protected]>

file transfer https test use FTS on separate container

0cce0b3

Signed-off-by: Marcel Guzik <[email protected]>

Bravo555 force-pushed the improve/2477/decouple-config-update branch from 08944c0 to 0cce0b3 Compare December 15, 2023 14:05

Bravo555 temporarily deployed to Test Pull Request December 15, 2023 14:12 — with GitHub Actions Inactive

Bravo555 merged commit 2743c64 into thin-edge:main Dec 15, 2023
18 checks passed

Bravo555 deleted the improve/2477/decouple-config-update branch December 15, 2023 15:34

rina23q mentioned this pull request Dec 15, 2023

Get operation from JSON over MQTT instead of SmartREST #2482

Merged

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `config_update` file download from tedge-mapper-c8y to tedge-agent #2511

Move `config_update` file download from tedge-mapper-c8y to tedge-agent #2511

Bravo555 commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

rina23q left a comment

didier-wenzek commented Dec 12, 2023

Bravo555 commented Dec 13, 2023

jarhodes314 left a comment

jarhodes314 Dec 13, 2023

Bravo555 Dec 13, 2023

jarhodes314 Dec 13, 2023

didier-wenzek Dec 14, 2023

didier-wenzek Dec 14, 2023

Bravo555 Dec 14, 2023

didier-wenzek Dec 14, 2023 •

edited

Loading

albinsuresh Dec 15, 2023 •

edited

Loading

rina23q Dec 15, 2023

albinsuresh Dec 15, 2023

albinsuresh Dec 15, 2023

Bravo555 Dec 15, 2023

didier-wenzek Dec 15, 2023 •

edited

Loading

albinsuresh Dec 15, 2023 •

edited

Loading

Bravo555 commented Dec 15, 2023

reubenmiller commented Dec 15, 2023

didier-wenzek commented Dec 15, 2023 •

edited

Loading

rina23q left a comment

didier-wenzek left a comment

albinsuresh left a comment

Move config_update file download from tedge-mapper-c8y to tedge-agent #2511

Move config_update file download from tedge-mapper-c8y to tedge-agent #2511

Conversation

Bravo555 commented Dec 8, 2023 • edited Loading

TODO

Proposed changes

Motivation

Summary

Test changes

Types of changes

Paste Link to the issue

Checklist

Further comments

codecov bot commented Dec 8, 2023 • edited Loading

Codecov Report

github-actions bot commented Dec 8, 2023 • edited Loading

Robot Results

rina23q left a comment

Choose a reason for hiding this comment

didier-wenzek commented Dec 12, 2023

Bravo555 commented Dec 13, 2023

jarhodes314 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek Dec 14, 2023 • edited Loading

Choose a reason for hiding this comment

albinsuresh Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

albinsuresh Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Bravo555 commented Dec 15, 2023

reubenmiller commented Dec 15, 2023

didier-wenzek commented Dec 15, 2023 • edited Loading

rina23q left a comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

albinsuresh left a comment

Choose a reason for hiding this comment

Move `config_update` file download from tedge-mapper-c8y to tedge-agent #2511

Move `config_update` file download from tedge-mapper-c8y to tedge-agent #2511

Bravo555 commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

didier-wenzek Dec 14, 2023 •

edited

Loading

albinsuresh Dec 15, 2023 •

edited

Loading

didier-wenzek Dec 15, 2023 •

edited

Loading

albinsuresh Dec 15, 2023 •

edited

Loading

didier-wenzek commented Dec 15, 2023 •

edited

Loading