-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move config_update
file download from tedge-mapper-c8y to tedge-agent
#2511
Move config_update
file download from tedge-mapper-c8y to tedge-agent
#2511
Conversation
Codecov Report
Additional details and impacted files
|
db44ab4
to
b0bf4c7
Compare
Robot Results
|
b0bf4c7
to
e8223c3
Compare
e8223c3
to
f8a4088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution to the issue looks good.
The major problems are
- Removing the access to symlinks in c8y-mapper is forgotten in both code and tests.
- Splitting
config_operations.rs
to two files is nice, but it's a pain to review since you made some changes onconfig_update.rs
.
Also some minors.
crates/common/tedge_config/src/tedge_config_cli/tedge_config.rs
Outdated
Show resolved
Hide resolved
crates/extensions/c8y_mapper_ext/src/operations/config_update.rs
Outdated
Show resolved
Hide resolved
crates/extensions/c8y_mapper_ext/src/operations/config_update.rs
Outdated
Show resolved
Hide resolved
@Bravo555 Can you do the following:
This will ease reviewing this PR and merging it with Rina's work. |
Made a separate reorg PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a decent look at the agent side, I haven't had a particularly close look at the mapper changes yet
return Ok(()); | ||
}; | ||
|
||
if update_payload.remote_url.is_empty() || update_payload.tedge_url.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hypothesising a lot here, but it feels here like there is some logic that decides from the payload what state we're in that's distributed around this actor's methods. Can we do some sort of conversion from ConfigUpdateCmdPayload
into an enum that represents what state we're currently in so we process that information in a single place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing what you mean slightly, but IMO this is much more the case in other operation handlers, which have more states, here we only: 1. process messages which have a remoteUrl
and don't have tedgeUrl
, 2. download from the remote URL, put in the file-transfer directory, and put the URL inside tedgeUrl
. I agree that transitions between states in operation handling is hardly visible, but I thought about improving that in all the operations later, but here I'm not sure if you mean something simpler than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't deeply considered a solution, it just "felt like" it could do with being more "parse, don't validate", but I'm happy to be overruled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that the logic to process a config update request is scattered in so many places that it's really difficult to be sure that this is working as expected and really easy to break.
- Instead of what we have today (some dumb payloads in
tedge_api
and processing logic inc8y_mapper_ext
,tedge_config_ext
and nowtedge_agent
) - I would prefer a central place where are defined the misc steps required to process a configuration update and to reduce the role of the mapper, agent, and plugins to trigger actions.
This is how I understand @jarhodes314 call for "parse, don't validate":
impl FileCacheActor {
async fn process_mqtt_message(
&mut self,
mqtt_message: MqttMessage,
) -> Result<(), RuntimeError> {
match ConfigUpdate::action_for(mqtt_message) {
Some(DownloadRemote { cloud_url, }) => {
...
}
}
}
Unfortunately, this is not the best time to fix that.
f8a4088
to
540a18f
Compare
540a18f
to
6276357
Compare
Ok(download) => download, | ||
}; | ||
|
||
self.create_symlink_for_config_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this symlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand the question. It should be the same as before: we download the file to the data dir cache directory, and then create the symlink to this file in the file transfer directory. After the operation is complete, we delete the symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. I though this has been introduced by this PR.
@rina23q do you know the answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file-transfer service only allows you to share files under the /var/tedge/file-transfer
directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache
) which doesn't fall under the file-transfer
dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer
so that the same path can be shared with all clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @Bravo555 said is totally correct.
Creating a symlink from cache is more about the security. Since the files under /var/tedge/file-transfer
can be accessible from all other thin-edge components, that means they can modify the contents by using HTTP requests. Keeping /var/tedge/cache
out of the file transfer directory ensures the files cannot change by any HTTP requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping /var/tedge/cache out of the file transfer directory ensures the files cannot change by any HTTP requests.
Good point @rina23q
6276357
to
db98af7
Compare
let download_request = DownloadRequest::new(&request.tedge_url, temp_path.as_std_path()) | ||
let Some(tedge_url) = &request.tedge_url else { | ||
debug!("tedge_url not present in config update payload, ignoring"); | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's too late now, but it feels like properly using the scheduled state (the agent updating the state from init
to schedule
after the download, and then this config manager actor reacting only to that scheduled
state) would have been clearer, in terms of the operation control flow, IMO. Though it works, reacting to the executing
state differently at different times looks a bit convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But arguably, downloading the config the file is a normal part of execution of the config_file
operation, and most of the actual time of the operation execution is spent there, so I don't think downloading in init
or schedule
state would be better. For other complex operations there can be multiple steps for which we don't have enough distinct states. I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, indeed too late to fix that, but it would have been better to introduce a specific state - say downloading.
I do agree, however, that it's a bit hard to track and I expect that perhaps custom workflows and some other refactorings I plan for operation code would at least alleviate this somewhat.
I expect the same ;-)
However, the current design choice will make harder the integration with operation workflows as the latter assume that the action to handle a command in a given state can be derived only from the status (e.g. "downloading") and not after some constraints on the request payload (e.g. "there is a remote url but not local url").
Ok(download) => download, | ||
}; | ||
|
||
self.create_symlink_for_config_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file-transfer service only allows you to share files under the /var/tedge/file-transfer
directory (or at least that was the case earlier). Since the cached files are downloaded to a different location (/var/tedge/cache
) which doesn't fall under the file-transfer
dir, we just create symlinks under that directory pointing to the cached file. One easy solution, to avoid this symlink business, would be to create the cache directory also under the /var/tedge/file-transfer
so that the same path can be shared with all clients.
It seems that the integration tests are very flaky, also failing on the main branch: https://github.com/thin-edge/thin-edge.io/actions/runs/7219433666/job/19670621507 Could it be due to recent changes to registration message handling? |
595dd9b
to
08944c0
Compare
Quickly looking at the failed tests, I think it might be due to the workflow stuff (#2496) |
So indeed related to checking that a restart can be triggered using a workflow. The test is red but for bad reason as the outcome is as expected (except that the message is doubled).
Here is a PR to make this test more robust : #2530 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
My only concern is #2511 (comment). But it's not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can improve this later with explicit downloading
and downloaded
states as proposed in the comments.
Signed-off-by: Marcel Guzik <[email protected]>
Signed-off-by: Marcel Guzik <[email protected]>
08944c0
to
0cce0b3
Compare
TODO
Proposed changes
Motivation
This PR moves file download from cloud - which needs to happen when handling
config_update
operation - from the mapper to the tedge-agent, in order to simplify the mapper. However, the complexity is not eliminated and is merely moved into tedge-agent, which now needs to have special handling for Cumulocity. However, this is accepted for now, because it enables tedge-agent and tedge-mapper-c8y to be run on different containers, and eliminates the implicit dependency on File Transfer Service (a part of tedge-agent) in tedge-mapper-c8y.Summary
The updated operation works like this:
The
Download configuration type with type
Smartrest message is received by the mappertedge-mapper-c8y converts the smartrest message into the following MQTT message:
i.e.
tedgeUrl
property was made optional and in the initial message it isn't set.tedge-configuration-manager (part of tedge-agent) of the given entity, if the config type is supported, marks the operation as executing and waits for the operation to be updated with
tedgeUrl
tedge-agent on the main device, upon seing that
config_update
operation is being executed, but does not havetedgeUrl
, downloads the file inremoteUrl
, saves it into FTS, and updates the operation with a validtedgeUrl
pointing into FTStedge-configuration-manager of the given entity, upon seeing that
tedgeUrl
is now present, downloads the file from it and continues processing the operationThe workflow continues unchanged from this point.
Test changes
To test the changes,
configuration_with_file_transfer_https.robot
was modified to starttedge-agent
on a child device, andtedge-agent
was removed from the main device.Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments
The current implementation of caching files which are further needed by child devices, is very ad-hoc and as said in #2071 (comment), can be improved by custom workflow. This is something we'll need to think more about, but for now, it's enough to remove the implicit dependency on the FTS in tedge-mapper-c8y.