Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move config_update file download from tedge-mapper-c8y to tedge-agent #2511

Merged
merged 2 commits into from
Dec 15, 2023

Conversation

Bravo555
Copy link
Contributor

@Bravo555 Bravo555 commented Dec 8, 2023

TODO

  • addressing feedback for unit tests

Proposed changes

Motivation

This PR moves file download from cloud - which needs to happen when handling config_update operation - from the mapper to the tedge-agent, in order to simplify the mapper. However, the complexity is not eliminated and is merely moved into tedge-agent, which now needs to have special handling for Cumulocity. However, this is accepted for now, because it enables tedge-agent and tedge-mapper-c8y to be run on different containers, and eliminates the implicit dependency on File Transfer Service (a part of tedge-agent) in tedge-mapper-c8y.

Summary

The updated operation works like this:

  1. The Download configuration type with type Smartrest message is received by the mapper

  2. tedge-mapper-c8y converts the smartrest message into the following MQTT message:

    topic: te/device/device01///cmd/config_update/1234
    payload: {"status": "init", "remoteUrl": "https://example.org/file", "configType": "type"}
    

    i.e. tedgeUrl property was made optional and in the initial message it isn't set.

  3. tedge-configuration-manager (part of tedge-agent) of the given entity, if the config type is supported, marks the operation as executing and waits for the operation to be updated with tedgeUrl

  4. tedge-agent on the main device, upon seing that config_update operation is being executed, but does not have tedgeUrl, downloads the file in remoteUrl, saves it into FTS, and updates the operation with a valid tedgeUrl pointing into FTS

  5. tedge-configuration-manager of the given entity, upon seeing that tedgeUrl is now present, downloads the file from it and continues processing the operation

  6. The workflow continues unchanged from this point.

Test changes

To test the changes, configuration_with_file_transfer_https.robot was modified to start tedge-agent on a child device, and tedge-agent was removed from the main device.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

The current implementation of caching files which are further needed by child devices, is very ad-hoc and as said in #2071 (comment), can be improved by custom workflow. This is something we'll need to think more about, but for now, it's enough to remove the implicit dependency on the FTS in tedge-mapper-c8y.

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Merging #2511 (0cce0b3) into main (04db825) will increase coverage by 0.0%.
The diff coverage is 13.9%.

Additional details and impacted files
Files Coverage Δ
.../tedge_config/src/tedge_config_cli/tedge_config.rs 80.8% <100.0%> (ø)
crates/core/tedge_agent/src/lib.rs 0.0% <ø> (ø)
crates/core/tedge_api/src/messages.rs 83.4% <ø> (ø)
crates/extensions/c8y_mapper_ext/src/converter.rs 82.4% <100.0%> (-0.1%) ⬇️
crates/core/c8y_api/src/http_proxy.rs 75.1% <0.0%> (ø)
crates/extensions/c8y_auth_proxy/src/url.rs 79.7% <0.0%> (-1.2%) ⬇️
crates/extensions/c8y_mapper_ext/src/actor.rs 78.1% <83.3%> (+3.1%) ⬆️
...ons/c8y_mapper_ext/src/operations/config_update.rs 91.3% <92.3%> (+43.7%) ⬆️
...rates/extensions/tedge_config_manager/src/actor.rs 63.0% <40.0%> (-0.5%) ⬇️
crates/core/tedge_agent/src/agent.rs 0.0% <0.0%> (ø)
... and 1 more

... and 1 file with indirect coverage changes

@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from db44ab4 to b0bf4c7 Compare December 8, 2023 13:09
Copy link
Contributor

github-actions bot commented Dec 8, 2023

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
373 0 3 373 100 58m33.902999999s

@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 8, 2023 13:16 — with GitHub Actions Inactive
@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from b0bf4c7 to e8223c3 Compare December 12, 2023 07:51
@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 12, 2023 07:58 — with GitHub Actions Inactive
@Bravo555 Bravo555 marked this pull request as ready for review December 12, 2023 09:28
@Bravo555 Bravo555 marked this pull request as draft December 12, 2023 09:29
@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from e8223c3 to f8a4088 Compare December 12, 2023 09:57
@Bravo555 Bravo555 marked this pull request as ready for review December 12, 2023 09:57
@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 12, 2023 10:03 — with GitHub Actions Inactive
Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution to the issue looks good.

The major problems are

  • Removing the access to symlinks in c8y-mapper is forgotten in both code and tests.
  • Splitting config_operations.rs to two files is nice, but it's a pain to review since you made some changes on config_update.rs.

Also some minors.

crates/extensions/c8y_mapper_ext/src/converter.rs Outdated Show resolved Hide resolved
crates/extensions/tedge_config_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_mapper_ext/src/operations/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
@didier-wenzek
Copy link
Contributor

  • Splitting config_operations.rs to two files is nice, but it's a pain to review since you made some changes on config_update.rs.

@Bravo555 Can you do the following:

  1. Create a new PR, where you introduce only these changes:
    1. Introduce this new operations module
    2. git mv there log_update.rs, firmware_update.rs, config_operations.rs
    3. split config_operations.rs into config_snapshot.rs and config_update.rs.
  2. This reorg PR should be reviewed and merged quickly.
  3. Rebase this PR on top of the reorg PR.
  4. @rina23q rebase Get operation from JSON over MQTT instead of SmartREST #2482 on top of this same reorg PR.

This will ease reviewing this PR and merging it with Rina's work.

@Bravo555
Copy link
Contributor Author

Made a separate reorg PR:

#2517

Copy link
Contributor

@jarhodes314 jarhodes314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a decent look at the agent side, I haven't had a particularly close look at the mapper changes yet

crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
crates/extensions/tedge_config_manager/src/actor.rs Outdated Show resolved Hide resolved
return Ok(());
};

if update_payload.remote_url.is_empty() || update_payload.tedge_url.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hypothesising a lot here, but it feels here like there is some logic that decides from the payload what state we're in that's distributed around this actor's methods. Can we do some sort of conversion from ConfigUpdateCmdPayload into an enum that represents what state we're currently in so we process that information in a single place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing what you mean slightly, but IMO this is much more the case in other operation handlers, which have more states, here we only: 1. process messages which have a remoteUrl and don't have tedgeUrl, 2. download from the remote URL, put in the file-transfer directory, and put the URL inside tedgeUrl. I agree that transitions between states in operation handling is hardly visible, but I thought about improving that in all the operations later, but here I'm not sure if you mean something simpler than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't deeply considered a solution, it just "felt like" it could do with being more "parse, don't validate", but I'm happy to be overruled here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that the logic to process a config update request is scattered in so many places that it's really difficult to be sure that this is working as expected and really easy to break.

  • Instead of what we have today (some dumb payloads in tedge_api and processing logic in c8y_mapper_ext, tedge_config_ext and now tedge_agent)
  • I would prefer a central place where are defined the misc steps required to process a configuration update and to reduce the role of the mapper, agent, and plugins to trigger actions.

This is how I understand @jarhodes314 call for "parse, don't validate":

impl FileCacheActor {
    async fn process_mqtt_message(
        &mut self,
        mqtt_message: MqttMessage,
    ) -> Result<(), RuntimeError> {
        match ConfigUpdate::action_for(mqtt_message) {
            Some(DownloadRemote { cloud_url, }) => {
                ...
            } 
        }
}

Unfortunately, this is not the best time to fix that.

@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from f8a4088 to 540a18f Compare December 13, 2023 14:53
@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from 540a18f to 6276357 Compare December 14, 2023 09:35
crates/core/tedge_agent/Cargo.toml Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
Ok(download) => download,
};

self.create_symlink_for_config_update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this symlink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the question. It should be the same as before: we download the file to the data dir cache directory, and then create the symlink to this file in the file transfer directory. After the operation is complete, we delete the symlink.

Copy link
Contributor

@didier-wenzek didier-wenzek Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. I though this has been introduced by this PR.

@rina23q do you know the answer?

Copy link
Contributor

@albinsuresh albinsuresh Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-transfer service only allows you to share files under the /var/tedge/file-transfer directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache) which doesn't fall under the file-transfer dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer so that the same path can be shared with all clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @Bravo555 said is totally correct.

Creating a symlink from cache is more about the security. Since the files under /var/tedge/file-transfer can be accessible from all other thin-edge components, that means they can modify the contents by using HTTP requests. Keeping /var/tedge/cache out of the file transfer directory ensures the files cannot change by any HTTP requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping /var/tedge/cache out of the file transfer directory ensures the files cannot change by any HTTP requests.

Good point @rina23q

@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from 6276357 to db98af7 Compare December 14, 2023 10:58
@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 14, 2023 15:28 — with GitHub Actions Inactive
let download_request = DownloadRequest::new(&request.tedge_url, temp_path.as_std_path())
let Some(tedge_url) = &request.tedge_url else {
debug!("tedge_url not present in config update payload, ignoring");
return Ok(());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's too late now, but it feels like properly using the scheduled state (the agent updating the state from init to schedule after the download, and then this config manager actor reacting only to that scheduled state) would have been clearer, in terms of the operation control flow, IMO. Though it works, reacting to the executing state differently at different times looks a bit convoluted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But arguably, downloading the config the file is a normal part of execution of the config_file operation, and most of the actual time of the operation execution is spent there, so I don't think downloading in init or schedule state would be better. For other complex operations there can be multiple steps for which we don't have enough distinct states. I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.

Copy link
Contributor

@didier-wenzek didier-wenzek Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed too late to fix that, but it would have been better to introduce a specific state - say downloading.

I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.

I expect the same ;-)

However, the current design choice will make harder the integration with operation workflows as the latter assume that the action to handle a command in a given state can be derived only from the status (e.g. "downloading") and not after some constraints on the request payload (e.g. "there is a remote url but not local url").

crates/core/tedge_agent/src/operation_file_cache/mod.rs Outdated Show resolved Hide resolved
Ok(download) => download,
};

self.create_symlink_for_config_update(
Copy link
Contributor

@albinsuresh albinsuresh Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-transfer service only allows you to share files under the /var/tedge/file-transfer directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache) which doesn't fall under the file-transfer dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer so that the same path can be shared with all clients.

@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 15, 2023 09:01 — with GitHub Actions Inactive
@Bravo555
Copy link
Contributor Author

It seems that the integration tests are very flaky, also failing on the main branch:

https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507

Could it be due to recent changes to registration message handling?

@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from 595dd9b to 08944c0 Compare December 15, 2023 09:34
@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 15, 2023 09:41 — with GitHub Actions Inactive
@reubenmiller
Copy link
Contributor

It seems that the integration tests are very flaky, also failing on the main branch:

https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507

Could it be due to recent changes to registration message handling?

Quickly looking at the failed tests, I think it might be due to the workflow stuff (#2496)

@didier-wenzek
Copy link
Contributor

didier-wenzek commented Dec 15, 2023

It seems that the integration tests are very flaky, also failing on the main branch:
https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507
Could it be due to recent changes to registration message handling?

Quickly looking at the failed tests, I think it might be due to the workflow stuff (#2496)

So indeed related to checking that a restart can be triggered using a workflow.

The test is red but for bad reason as the outcome is as expected (except that the message is doubled).

Matching messages is greater than maximum. wanted: 1 got: 2 messages: ['{"old-tedge-agent-pid":"MainPID=100","status":"tedge-agent-restarted","tedge-agent-pid":"MainPID=467"}', '{"old-tedge-agent-pid":"MainPID=100","status":"tedge-agent-restarted","tedge-agent-pid":"MainPID=467"}'] 

Here is a PR to make this test more robust : #2530

Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

My only concern is #2511 (comment). But it's not for this PR.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We can improve this later with explicit downloading and downloaded states as proposed in the comments.

@Bravo555 Bravo555 force-pushed the improve/2477/decouple-config-update branch from 08944c0 to 0cce0b3 Compare December 15, 2023 14:05
@Bravo555 Bravo555 temporarily deployed to Test Pull Request December 15, 2023 14:12 — with GitHub Actions Inactive
@Bravo555 Bravo555 merged commit 2743c64 into thin-edge:main Dec 15, 2023
18 checks passed
@Bravo555 Bravo555 deleted the improve/2477/decouple-config-update branch December 15, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants