Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-release 24.04 HA #3237

Draft
wants to merge 43 commits into
base: staging
Choose a base branch
from
Draft

Post-release 24.04 HA #3237

wants to merge 43 commits into from

Conversation

cg-tw
Copy link
Collaborator

@cg-tw cg-tw commented Apr 18, 2024

Description

Please include a short summary of the changes and what is the purpose of the PR. Any relevant information should be added to help reviewers.

Target version (i.e. version that this PR changes)

  • 22.04.x
  • 22.10.x
  • 23.04.x
  • 23.10.x
  • 24.04.x
  • Cloud
  • Monitoring Connectors

Copy link
Contributor

github-actions bot commented Apr 22, 2024

PR Previews
🚀 Deployed preview to https://docs-preview-int.centreon.com/previews/pr-3237/staging/
at Wed, 30 Oct 2024 15:30:32 GMT

NOTE: Previews are deleted after 30 days of inactivity

@cg-tw cg-tw changed the title HA 24.04 Post-release 24.04 HA Apr 30, 2024
@cg-tw cg-tw added the HA label Apr 30, 2024
The situation has stabilized, and you can perform a failover by moving the **centreon** resource.

```shell
pcs resource move centreon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we do this here. But if we do, we must clear the constraints afterwards...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanguyvda it seems there is a problem with the "loss of the active node" procedure. Can you check it thoroughly and flag any parts that need to be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that can be confusing. But the purpose of this part is to go back to the initial state. To do so, this documentation tells us to move the centreon resource. Which can feel weird since we were talking about the ms_mysql resource a few lines ago.
But due to the constraint mecanism (since it is a two nodes cluster in the example), moving the centreon resource will move the ms_mysql resource too.

what is not good is that the documentation switches from a 4 nodes cluster to a 2 nodes cluster at some point and then many command line result are not portraying what is described in the documentation

@cg-tw cg-tw force-pushed the MON-37984-ha-24-04 branch from 6b84ae7 to 9a60aa8 Compare September 5, 2024 08:27
Copy link
Contributor

@tanguyvda tanguyvda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from the operating-guide, things look good

The situation has stabilized, and you can perform a failover by moving the **centreon** resource.

```shell
pcs resource move centreon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that can be confusing. But the purpose of this part is to go back to the initial state. To do so, this documentation tells us to move the centreon resource. Which can feel weird since we were talking about the ms_mysql resource a few lines ago.
But due to the constraint mecanism (since it is a two nodes cluster in the example), moving the centreon resource will move the ms_mysql resource too.

what is not good is that the documentation switches from a 4 nodes cluster to a 2 nodes cluster at some point and then many command line result are not portraying what is described in the documentation


* All resources appear stopped on the passive node (this is because the passive node does not see the quorum device anymore, as "partition WITHOUT quorum" indicates below. The resources are stopped.)
* The active node is seen as `offline` (as the passive node is cut off from the rest of the cluster):

Copy link
Contributor

@tanguyvda tanguyvda Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from now on, every crm_mon/pcs status is made using a 2 nodes cluster instead of the 4 nodes cluster at the beginning

we need to either change the beginning to match a 2 nodes cluster or change the end to match a 4 nodes cluster

Copy link
Member

@vuntz vuntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't re-read everything, but assuming my comments were addressed: all recent changes are good with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants