Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't process operation status duplicates #3053

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

Bravo555
Copy link
Contributor

@Bravo555 Bravo555 commented Aug 6, 2024

Proposed changes

If c8y mapper receives operation message with the same status again, don't send smartrest operation status message again.

It's still not clear what should be done if operation is cleared while processing or if some component publishes a previous status (e.g. c8y mapper received operation message with status SUCCESSFUL, but later received operation with the same ID with status EXECUTING).

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

Attention: Patch coverage is 91.40127% with 27 lines in your changes missing coverage. Please review.

Project coverage is 78.6%. Comparing base (5358ddc) to head (79fb63c).
Report is 62 commits behind head on main.

Files Patch % Lines
...xtensions/c8y_mapper_ext/src/operations/handler.rs 93.0% 6 Missing and 13 partials ⚠️
crates/extensions/c8y_mapper_ext/src/tests.rs 85.7% 0 Missing and 5 partials ⚠️
...ions/c8y_mapper_ext/src/operations/handlers/mod.rs 40.0% 3 Missing ⚠️
Additional details and impacted files
Files Coverage Δ
...ions/c8y_mapper_ext/src/operations/handlers/mod.rs 76.6% <40.0%> (-0.4%) ⬇️
crates/extensions/c8y_mapper_ext/src/tests.rs 92.6% <85.7%> (-0.1%) ⬇️
...xtensions/c8y_mapper_ext/src/operations/handler.rs 94.3% <93.0%> (-0.9%) ⬇️

... and 7 files with indirect coverage changes

@rina23q
Copy link
Member

rina23q commented Aug 6, 2024

It's still not clear what should be done if operation is cleared while processing or if some component publishes a previous status (e.g. c8y mapper received operation message with status SUCCESSFUL, but later received operation with the same ID with status EXECUTING).

Once the operation is marked final state (either SUCCESSFUL or FAILED), SmartREST cannot change the state (doable via REST API though). However, I believe we should not update the operation state after operation gets final state. In short, ignore further operation status (ideally logging warning for this case).

Copy link
Contributor

github-actions bot commented Aug 6, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
491 0 2 491 100 1h24m39.436972s

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some queries. Also noticed that the codecov coverage has gone down. Something to look at.


let running_operation = self.running_operations.remove(&topic);
let status = match GenericCommandState::from_command_message(&message.message) {
Ok(command) if command.status.is_empty() && is_clearing_message(&message.message) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ok(command) if command.status.is_empty() && is_clearing_message(&message.message) => {
Ok(command) if command.is_cleared() => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0a18530

operation
.handle
.await
.expect("operation task should not panic");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not done in this PR, I'm having second thoughts about that expect call. Isn't it safer to just log the operation handler failure, if at all one happens, rather than crashing the whole mapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.update() only panics when it can't send a message to the MQTT publisher via a channel. The MQTT sender it uses is a cloned from C8yMapper, so if this happens we can't send any MQTT messages. This is an unrecoverable failure, so it should be a panic. The actor runtime should restart the entrire actor.

The fact that we only panic on MQTT channel failure should become more clear after it's extracted.


// we got a new status, await we handled the previous one and handle the new one
let context = Arc::clone(&self.context);
let handle = tokio::spawn(async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't funny understand why you moved to this model where new tasks are spawned for each state transition as opposed to the previous model of single task per operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to extract status from every message to prevent illegal state transitions, it makes sense to do it once in the handle method of the handler for all operation types. And for that we need to know what is the current state of the operation.
It also achieves separation of concerns: operation handler keeps state and handles the lifecycle of the operation, and context.update() is just a function that sends a smartrest response when given an operation message.
Ultimately, we only spawn a task so we that we can eventually generate a smartrest response (possibly after downloads and uploads) for that operation message. The logic to determine if a state transition is valid should be lifted up.

Comment on lines 184 to 186
// TODO(marcel):
//- what happens when clearing message is posted by other client while we're
// handling a message? E.g. if we're doing a download to handle SUCCESSFUL status, do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd process it only after the previous state transition is fully processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already was happening, TODO removed and clarification comment in 0a18530

//- what happens when clearing message is posted by other client while we're
// handling a message? E.g. if we're doing a download to handle SUCCESSFUL status, do
// we abort or keep going?
// - what if we get a message that moves back status, i.e. SUCCESSFUL -> EXECUTING?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be allowed. Log a warning and ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0a18530

#[allow(clippy::match_like_matches_macro)]
match (previous, next) {
("successful", "executing") => false,
("failed", "executing") => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[successful -> failed] and [failed -> successful] cases should be also forbidden. To be precise, any state transitions from the final state (successful/failed) is forbidden. So, something like below is more accurate. (not sure if this is valid Rust, however, hope you understand what I wanted to say)

        ("successful", _) => false,
        ("failed", _) => false,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0ebe371

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine, except for a minor issue regarding duplicate state tranistions.

#[allow(clippy::match_like_matches_macro)]
match (previous, next) {
// not really a transition but true for simplicity
(prev, next) if prev == next => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(prev, next) if prev == next => true,
(prev, next) if prev == next => false,

Allowing duplicate state transitions is problematic when we use SmartREST. For example, if the mapper processes two executing responses back-to-back, the first mapped 501 message will properly change the status of the original request. But if we send another 501 message for the duplicated executing response, then it might wrongly change the status of the next operation of the same type, if multiple requests of the same type are queued together on the cloud. This is a side effect of the id-less design of SmartREST that does not target a specific operation id, but always tries to change that status of the operations in the order in which they were triggered. So, we should just skip mapping duplicate state transitions. Fortunately, the check in line no. 174 is already doing that, which bypasses this case here. But, it's still better to keep it consistent here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it depends what we mean by "transition". Here I meant more like an identity transition of the status on the operation itself, not really related to reporting of the transition to Cumulocity. As the TODO says, if it were declared in tedge_api it would be a bit more clear, but here it looks like the logic is duplicated, which I agree with.
Still, for consistency's sake, changed the case to false in 46f1f2a.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

If c8y mapper receives operation message with the same status again,
don't send smartrest operation status message again.

It's still not clear what should be done if operation is cleared while
processing or if some component publishes a previous status (e.g. c8y
mapper received operation message with status SUCCESSFUL, but later
received operation with the same ID with status EXECUTING).

Signed-off-by: Marcel Guzik <[email protected]>
@Bravo555 Bravo555 force-pushed the fix/operations-status-duplicates branch from 46f1f2a to 79fb63c Compare August 8, 2024 11:34
@Bravo555 Bravo555 temporarily deployed to Test Pull Request August 8, 2024 11:34 — with GitHub Actions Inactive
@Bravo555 Bravo555 added this pull request to the merge queue Aug 8, 2024
Merged via the queue into thin-edge:main with commit 95dd5f9 Aug 8, 2024
33 checks passed
@reubenmiller reubenmiller added the theme:c8y Theme: Cumulocity related topics label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:c8y Theme: Cumulocity related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants