-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operation Workflow Specification & Implementation #2071
Operation Workflow Specification & Implementation #2071
Conversation
@didier-wenzek. I've added only a few minor comments. I really like the direction we're going 👍 |
next = ["Download", "Failed"] | ||
script = "/bin/schedule-configuration-update.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the contract is like, if the script execution succeeds, the operation is moved to the first state in the next
array and on failure it is moved to the second one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to describe the contract.
The idea is that the script is given the current state as an argument and that it has to output the new state on stdout. If the output of the script cannot be decoded as a state, the command is moved to "failed".
script = "/bin/schedule-configuration-update.sh" | ||
|
||
[Download] | ||
owner = tedge-configuration-plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the tedge-agent
do anything with this owner
info, other than just determining if it is "tedge" or not? I mean, it doesn't care about the value as long as it is not tedge
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed in the POC nothing is done if the owner is not tedge
.
However, if the MQTT identifier of this owner is given (i.e. device/main/service/tedge-configuration-plugin
) a sophisticated implementation could use this to check if thing has a chance to work.
=> This needs to be clarified
The priority rules give a higher priority to the workflow that are user-defined than to those pre-defined by thin-edge. | ||
If several user-defined workflows are matching a command state, | ||
then the alphabetic order of the workflow definition file names is used: | ||
`001-configuration-update.toml` being of higher priority than `002-configuration-update.toml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with this priority mechanism, it's not fully clear to me as to how we can use different workflows for different targets of the same operation type. For example, with the config-update
operation, there'd be different kinds of custom scripts to perform updates of different types of configs. So, how would the workflow be handed over to the respective script for that given type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, with the
config-update
operation, there'd be different kinds of custom scripts to perform updates of different types of configs.
The proposal is to have a single set of actions for a (command, status) pair. If you want to behave differently depending on some property of the state payload (here the config type) then this has to be handle inside the script. If you want to handle this at the workflow level, you can introduce new state (e.g. update_mosquitto_config
and update_other_config
) but you need then to run a script on the state update_config
that move to one or the other state depending on the config type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions 👍
next = ["Install", "Failed"] | ||
|
||
[Install] | ||
owner = "tedge-configuration-plugin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided tedge-configuration-plguin
is a daemon, this owner
field is just indicating the daemon name for logging purpose or like that? I assume that the state machine doesn't really care about the process owner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. See #2071 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something still needs to be clarified.
Indeed, the tedge-configuration-plugin
can be implemented as a:
- as a daemon - as of today
- as a script - as you suggested here
- as an actor running inside the agent - as implemented by the POC
These 3 alternatives have different pros/cons as well as implementation impacts. I will dive into these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The comments are mostly rewording suggestions and a few queries for clarification. Happy to approve once those queries are clarified.
- possible extra instructions on how to process the command at this stage, e.g. | ||
- run a script | ||
|
||
```toml title="file: firmware_update_example.toml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The states mentioned in this file are not the same that were described in the example above. Keeping them consistent would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this is in a section called "workflow overriding", so it is meant to show different states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in that case it might be a good idea to add a workflow file example conforming to that same simple example somewhere earlier: may be, in the Operation API
section itself, right after the intro of the APIs, so that we have something basic before getting into advanced overriding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can add this in later.
|
||
[init] | ||
script = "/usr/bin/firmware_handler.sh plan" | ||
next = ["scheduled", "failed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be obvious from these examples, but it might be better to explicitly document that the first value is the target state on a successful state action and the second one is for failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of values does not mean anything at this point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something here then. My understanding was that, if the script exits with non-zero exit code, without any JSON output with explicit status
, then the workflow moves to the first state mentioned in the array. In the case of a script execution failure, it moves to the second. Isn't that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't implemented yet, and we are still deciding on what to do there. This will be done in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this list is mostly documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Any further improvements can be done in followup PRs
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
The motivation is to be able to register a custom version of a built-in workflow. The mappers still create commands in the Init state, but the built-in operations only react on the Schedule state. If no custom version of the workflow has been provided, the agent moves operation requests from the Init to the Scheduled state. If a custom version is provided, then this user-defined workflow can add extra checks and steps before triggering the built-in operation steps (by putting the operation in the Schedule state). As of now, the log and config operation are unchanged (starting on Init and ignoring the Schedule state). Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
- Basic concepts are moved under the operation API introduction - Workflow specifications is focused on user-specific workflow definition and start with an example. Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Reuben Miller <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
83e365d
to
17e3d66
Compare
Here are the follow-up tasks for all the unresolved comments: #2478 |
QA has thoroughly checked the feature and here are the results:
|
Proposed changes
Specification for MQTT-driven workflows that let the user extend the builtin operation support provided by thin-edge.
See the POC for more details on the idea.
Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments