-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't process operation status duplicates #3053
don't process operation status duplicates #3053
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
|
Once the operation is marked final state (either SUCCESSFUL or FAILED), SmartREST cannot change the state (doable via REST API though). However, I believe we should not update the operation state after operation gets final state. In short, ignore further operation status (ideally logging warning for this case). |
Robot Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some queries. Also noticed that the codecov coverage has gone down. Something to look at.
|
||
let running_operation = self.running_operations.remove(&topic); | ||
let status = match GenericCommandState::from_command_message(&message.message) { | ||
Ok(command) if command.status.is_empty() && is_clearing_message(&message.message) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok(command) if command.status.is_empty() && is_clearing_message(&message.message) => { | |
Ok(command) if command.is_cleared() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0a18530
operation | ||
.handle | ||
.await | ||
.expect("operation task should not panic"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not done in this PR, I'm having second thoughts about that expect
call. Isn't it safer to just log the operation handler failure, if at all one happens, rather than crashing the whole mapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.update()
only panics when it can't send a message to the MQTT publisher via a channel. The MQTT sender it uses is a cloned from C8yMapper, so if this happens we can't send any MQTT messages. This is an unrecoverable failure, so it should be a panic. The actor runtime should restart the entrire actor.
The fact that we only panic on MQTT channel failure should become more clear after it's extracted.
|
||
// we got a new status, await we handled the previous one and handle the new one | ||
let context = Arc::clone(&self.context); | ||
let handle = tokio::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't funny understand why you moved to this model where new tasks are spawned for each state transition as opposed to the previous model of single task per operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to extract status from every message to prevent illegal state transitions, it makes sense to do it once in the handle method of the handler for all operation types. And for that we need to know what is the current state of the operation.
It also achieves separation of concerns: operation handler keeps state and handles the lifecycle of the operation, and context.update()
is just a function that sends a smartrest response when given an operation message.
Ultimately, we only spawn a task so we that we can eventually generate a smartrest response (possibly after downloads and uploads) for that operation message. The logic to determine if a state transition is valid should be lifted up.
// TODO(marcel): | ||
//- what happens when clearing message is posted by other client while we're | ||
// handling a message? E.g. if we're doing a download to handle SUCCESSFUL status, do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd process it only after the previous state transition is fully processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already was happening, TODO removed and clarification comment in 0a18530
//- what happens when clearing message is posted by other client while we're | ||
// handling a message? E.g. if we're doing a download to handle SUCCESSFUL status, do | ||
// we abort or keep going? | ||
// - what if we get a message that moves back status, i.e. SUCCESSFUL -> EXECUTING? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't be allowed. Log a warning and ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0a18530
#[allow(clippy::match_like_matches_macro)] | ||
match (previous, next) { | ||
("successful", "executing") => false, | ||
("failed", "executing") => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[successful -> failed] and [failed -> successful] cases should be also forbidden. To be precise, any state transitions from the final state (successful/failed) is forbidden. So, something like below is more accurate. (not sure if this is valid Rust, however, hope you understand what I wanted to say)
("successful", _) => false,
("failed", _) => false,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0ebe371
0ebe371
to
4f1f11b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look fine, except for a minor issue regarding duplicate state tranistions.
#[allow(clippy::match_like_matches_macro)] | ||
match (previous, next) { | ||
// not really a transition but true for simplicity | ||
(prev, next) if prev == next => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(prev, next) if prev == next => true, | |
(prev, next) if prev == next => false, |
Allowing duplicate state transitions is problematic when we use SmartREST. For example, if the mapper processes two executing
responses back-to-back, the first mapped 501
message will properly change the status of the original request. But if we send another 501
message for the duplicated executing
response, then it might wrongly change the status of the next operation of the same type, if multiple requests of the same type are queued together on the cloud. This is a side effect of the id-less design of SmartREST that does not target a specific operation id, but always tries to change that status of the operations in the order in which they were triggered. So, we should just skip mapping duplicate state transitions. Fortunately, the check in line no. 174 is already doing that, which bypasses this case here. But, it's still better to keep it consistent here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it depends what we mean by "transition". Here I meant more like an identity transition of the status on the operation itself, not really related to reporting of the transition to Cumulocity. As the TODO says, if it were declared in tedge_api it would be a bit more clear, but here it looks like the logic is duplicated, which I agree with.
Still, for consistency's sake, changed the case to false in 46f1f2a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If c8y mapper receives operation message with the same status again, don't send smartrest operation status message again. It's still not clear what should be done if operation is cleared while processing or if some component publishes a previous status (e.g. c8y mapper received operation message with status SUCCESSFUL, but later received operation with the same ID with status EXECUTING). Signed-off-by: Marcel Guzik <[email protected]>
46f1f2a
to
79fb63c
Compare
Proposed changes
If c8y mapper receives operation message with the same status again, don't send smartrest operation status message again.
It's still not clear what should be done if operation is cleared while processing or if some component publishes a previous status (e.g. c8y mapper received operation message with status SUCCESSFUL, but later received operation with the same ID with status EXECUTING).
Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments