Skip to content

Conversation

@DaanHoogland
Copy link
Contributor

@DaanHoogland DaanHoogland commented Dec 7, 2024

Resolve the issue : Should not shutting down a server which is already shut.

Description

This PR replaces #10055 on request of user.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@DaanHoogland DaanHoogland added this to the 4.19.2 milestone Dec 7, 2024
@DaanHoogland DaanHoogland mentioned this pull request Dec 7, 2024
14 tasks
@codecov
Copy link

codecov bot commented Dec 7, 2024

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 15.17%. Comparing base (4c072b5) to head (f5dee3b).
⚠️ Report is 81 commits behind head on 4.19.

Files with missing lines Patch % Lines
...va/org/apache/cloudstack/kvm/ha/KVMHAProvider.java 9.09% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.19   #10059   +/-   ##
=========================================
  Coverage     15.16%   15.17%           
- Complexity    11335    11340    +5     
=========================================
  Files          5412     5412           
  Lines        475042   475045    +3     
  Branches      57963    57964    +1     
=========================================
+ Hits          72060    72080   +20     
+ Misses       394924   394905   -19     
- Partials       8058     8060    +2     
Flag Coverage Δ
uitests 4.29% <ø> (ø)
unittests 15.89% <9.09%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@apache apache deleted a comment from blueorangutan Dec 8, 2024
@apache apache deleted a comment from blueorangutan Dec 8, 2024
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11748

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 8, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ B)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@DaanHoogland DaanHoogland requested a review from slavkap December 9, 2024 08:02
// host exists and is managed OOB
if (r != null && outOfBandManagementService.isOutOfBandManagementEnabled(r)) {
// check host status
if (Status.Down.equals(r.getStatus())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland, my concern here is that host's status could be Down but in reality to be still running. With this there is an option the virtual machines started to a neighbor host to be left running on the previous one.
Maybe it is better to check the OOBM status rather the host's.
Unfortunately now I don't have environment with IPMI setup and cannot test this change.
Probably here and @rp- could give some advice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True what @slavkap said, and now I'm a bit confused what actual the original issue was?
Wouldn't it be not just a noop if the host is already down and fence it again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to change this condition to something else , @slavkap ?
If it is !Down a power operation will be performed. should it happen here as well?
I am not sure if we have a OOB status retrieval possibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rp-, there was a discussion about an issue with the fencing that the Host is Powered off and CS tries to shut it down again. But I've seen cases where the host could be in the down state before the IPMI tries to power it off.
@DaanHoogland, yes, I think there is an option to check the OOBM status and the condition should be changed with the result of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it is not part of outOfBandManagementService.

@slavkap slavkap self-requested a review December 10, 2024 10:59
@DaanHoogland
Copy link
Contributor Author

@rp- @slavkap , any idea how to continue with this? I kind of copied it from the OP but did not give it much thought.

@rp-
Copy link
Contributor

rp- commented Jan 22, 2025

@rp- @slavkap , any idea how to continue with this? I kind of copied it from the OP but did not give it much thought.

IDK, I'm not sure I still fully understand the issue here.

@DaanHoogland
Copy link
Contributor Author

@slavkap @rp- , I am moving this forward. I had re-based the code out of courtesy , but don't understand the issue beyond what I read in the code.

@DaanHoogland DaanHoogland modified the milestones: 4.19.2, 4.19.3 Feb 4, 2025
@Pearl1594 Pearl1594 moved this to In Progress in ACS 4.20.1 Mar 17, 2025
@DaanHoogland DaanHoogland changed the base branch from 4.19 to 4.20 May 1, 2025 13:19
@DaanHoogland DaanHoogland changed the base branch from 4.20 to 4.19 May 1, 2025 13:19
@DaanHoogland
Copy link
Contributor Author

@slavkap @rp- any thoughts on moving this forward? cna we merge and improve later, or should we implement an extra oob operation first?

cc @Pearl1594 @weizhouapache @rohityadavcloud

leo79901 and others added 2 commits May 1, 2025 15:22
Resolve the issur : Should not shutting down a server which is already shut.
Thanks for DaanHoogland which provide code.
@sonarqubecloud
Copy link

sonarqubecloud bot commented May 1, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ B)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@DaanHoogland
Copy link
Contributor Author

@leo79901 , can you please share any test results or other experience with this?
This has led to an unresolved discussion (#10059 (comment)) and any input is welcome.

@DaanHoogland DaanHoogland changed the title add check befor fencing add check before fencing May 2, 2025
@sureshanaparti sureshanaparti modified the milestones: 4.19.3, 4.19.4 Jun 5, 2025
@weizhouapache weizhouapache modified the milestones: 4.19.4, 4.20.2 Sep 2, 2025
@weizhouapache weizhouapache modified the milestones: 4.20.2, 4.20.3 Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

7 participants