Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-release 24.04 HA #3237

Draft
wants to merge 43 commits into
base: staging
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b58d571
HA 24.04
cg-tw Apr 18, 2024
3dbc050
Workaround for bugged link
cg-tw Apr 19, 2024
91935cd
Workaround for link bug
cg-tw Apr 19, 2024
285d900
Workaround for link bug
cg-tw Apr 19, 2024
d4e0add
Workaround for link bug
cg-tw Apr 19, 2024
15aa227
Fix broken link
cg-tw Apr 19, 2024
6c79e2d
Fix fix
cg-tw Apr 19, 2024
8f0ec71
Workaround for link bug
cg-tw Apr 22, 2024
6dc1c38
Workaround for link bug
cg-tw Apr 22, 2024
fea3e16
Merge branch 'staging' into MON-37984-ha-24-04
cg-tw May 6, 2024
11e9ea8
Update
cg-tw May 6, 2024
ed62407
Fix links
cg-tw May 6, 2024
30558c3
Workaround for buggy link
cg-tw May 6, 2024
393a789
Workaround for buggy link
cg-tw May 6, 2024
24bccf0
Workaround for buggy link
cg-tw May 6, 2024
f0a8a42
Update
cg-tw May 6, 2024
492e391
Remove duplicate entry from ToC
cg-tw May 6, 2024
d028cb0
Update schema
cg-tw May 6, 2024
25115ad
Update, WIP
cg-tw May 14, 2024
2e4bdbb
Update
cg-tw Jul 3, 2024
65c2ae6
Add file
cg-tw Jul 3, 2024
fc0ad29
Typo
cg-tw Jul 3, 2024
5bde029
Update
cg-tw Jul 3, 2024
d857597
Fix build (link bug workaround)
cg-tw Jul 4, 2024
104e5d5
Update
cg-tw Jul 9, 2024
03436a6
Remove installation file + typos
cg-tw Jul 10, 2024
b657080
Remove dead link
cg-tw Jul 11, 2024
b6e3a5f
Delete comment
cg-tw Jul 11, 2024
04e51d1
Remove old file
cg-tw Jul 16, 2024
59ed2f7
Changes following review
cg-tw Jul 16, 2024
347f97c
Update
cg-tw Jul 16, 2024
8d0de8e
Update
cg-tw Jul 16, 2024
97b84e8
Update
cg-tw Jul 16, 2024
b9f97c4
Update versioned_docs/version-24.04/update/update-centreon-ha.md
cg-tw Aug 26, 2024
4fc3844
Typos
cg-tw Aug 26, 2024
6438cc2
How to repair DB replication
cg-tw Aug 26, 2024
9a60aa8
Merge branch 'MON-37984-ha-24-04' of github.com:centreon/centreon-doc…
cg-tw Aug 26, 2024
8916313
[ENH] HA optimize resource order
cedricmeschin Sep 5, 2024
807fa3c
Add prerequisites topic
cg-tw Oct 2, 2024
0ee60c7
Workaround for abnormally dead link
cg-tw Oct 2, 2024
06629cd
Workaround for abnormally dead link
cg-tw Oct 2, 2024
63ee689
Add Prerequisites topic to ToC
cg-tw Oct 2, 2024
154cc9a
Add troubleshooting case
cg-tw Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ To simulate a network failure that would isolate the passive central node, you c

We're assuming that node 1 is the active node and node 2 is the passive node ([check the state of the cluster](#how-do-i-know-the-state-of-the-cluster) if you need to).

To perform this test, run the `iptables` commands on the **active central node**:
To perform this test, run the `iptables` commands on the **passive central node**. Thanks to these rules, all traffic coming from the active central node and the quorum device will be ignored by the passive central node:

```bash
iptables -A INPUT -s @CENTRAL_NODE1_IPADDR@ -j DROP
Expand All @@ -267,44 +267,44 @@ The passive central node is now excluded from the cluster.

If you run `pcs status` on the active central node:

* The resources and the cluster are still working.
* The resources and the cluster are still working (the output shows that the noe still sees the quorum device).
cg-tw marked this conversation as resolved.
Show resolved Hide resolved
* The passive central node is seen `offline` on the active node:

```text
Cluster name: centreon_cluster
Stack: corosync
Current DC: @CENTRAL_MASTER_NAME@ (version 1.1.23-1.el8_9.1-9acf116022) - partition with quorum
Current DC: @CENTRAL_NODE1_NAME@ (version 1.1.23-1.el8_9.1-9acf116022) - partition with quorum
Last updated: Thu May 5 10:34:05 2022
Last change: Thu May 5 09:09:50 2022 by root via crm_resource on @CENTRAL_MASTER_NAME@
Last change: Thu May 5 09:09:50 2022 by root via crm_resource on @CENTRAL_NODE1_NAME@

4 nodes configured
21 resource instances configured

Online: [ @DATABASE_MASTER_NAME@ @CENTRAL_MASTER_NAME@ @DATABASE_SLAVE_NAME@ ]
OFFLINE: [ @CENTRAL_SLAVE_NAME@ ]
Online: [ @DATABASE_NODE1_NAME@ @CENTRAL_NODE1_NAME@ @DATABASE_NODE2_NAME@ ]
OFFLINE: [ @CENTRAL_NODE2_NAME@ ]

Full list of resources:

Master/Slave Set: ms_mysql-clone [ms_mysql]
Masters: [ @DATABASE_MASTER_NAME@ ]
Slaves: [ @DATABASE_SLAVE_NAME@ ]
Stopped: [ @CENTRAL_MASTER_NAME@ @CENTRAL_SLAVE_NAME@ ]
vip_mysql (ocf::heartbeat:IPaddr2): Started @DATABASE_MASTER_NAME@
Masters: [ @DATABASE_NODE1_NAME@ ]
Slaves: [ @DATABASE_NODE2_NAME@ ]
Stopped: [ @CENTRAL_NODE1_NAME@ @CENTRAL_NODE2_NAME@ ]
vip_mysql (ocf::heartbeat:IPaddr2): Started @DATABASE_NODE1_NAME@
Clone Set: php-clone [php]
Started: [ @CENTRAL_MASTER_NAME@ ]
Stopped: [ @DATABASE_MASTER_NAME@ @DATABASE_SLAVE_NAME@ @CENTRAL_SLAVE_NAME@ ]
Started: [ @CENTRAL_NODE1_NAME@ ]
Stopped: [ @DATABASE_NODE1_NAME@ @DATABASE_NODE2_NAME@ @CENTRAL_NODE2_NAME@ ]
Clone Set: cbd_rrd-clone [cbd_rrd]
Started: [ @CENTRAL_MASTER_NAME@ ]
Stopped: [ @DATABASE_MASTER_NAME@ @DATABASE_SLAVE_NAME@ @CENTRAL_SLAVE_NAME@ ]
Started: [ @CENTRAL_NODE1_NAME@ ]
Stopped: [ @DATABASE_NODE1_NAME@ @DATABASE_NODE2_NAME@ @CENTRAL_NODE2_NAME@ ]
Resource Group: centreon
vip (ocf::heartbeat:IPaddr2): Started @CENTRAL_MASTER_NAME@
http (systemd:httpd24-httpd): Started @CENTRAL_MASTER_NAME@
gorgone (systemd:gorgoned): Started @CENTRAL_MASTER_NAME@
centreon_central_sync (systemd:centreon-central-sync): Started @CENTRAL_MASTER_NAME@
cbd_central_broker (systemd:cbd-sql): Started @CENTRAL_MASTER_NAME@
centengine (systemd:centengine): Started @CENTRAL_MASTER_NAME@
centreontrapd (systemd:centreontrapd): Started @CENTRAL_MASTER_NAME@
snmptrapd (systemd:snmptrapd): Started @CENTRAL_MASTER_NAME@
vip (ocf::heartbeat:IPaddr2): Started @CENTRAL_NODE1_NAME@
http (systemd:httpd24-httpd): Started @CENTRAL_NODE1_NAME@
gorgone (systemd:gorgoned): Started @CENTRAL_NODE1_NAME@
centreon_central_sync (systemd:centreon-central-sync): Started @CENTRAL_NODE1_NAME@
cbd_central_broker (systemd:cbd-sql): Started @CENTRAL_NODE1_NAME@
centengine (systemd:centengine): Started @CENTRAL_NODE1_NAME@
centreontrapd (systemd:centreontrapd): Started @CENTRAL_NODE1_NAME@
snmptrapd (systemd:snmptrapd): Started @CENTRAL_NODE1_NAME@

Daemon Status:
corosync: active/enabled
Expand All @@ -314,7 +314,7 @@ Daemon Status:

If you run `pcs status` on the passive node:

* All resources appear stopped on the passive node
* All resources appear stopped on the passive node (this is because the passive node does not see the quorum device anymore, as "partition WITHOUT quorum" indicates below. The resources are stopped.)
* The active node is seen as `offline` (as the passive node is cut off from the rest of the cluster):

Copy link
Contributor

@tanguyvda tanguyvda Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from now on, every crm_mon/pcs status is made using a 2 nodes cluster instead of the 4 nodes cluster at the beginning

we need to either change the beginning to match a 2 nodes cluster or change the end to match a 4 nodes cluster

```text
Expand Down Expand Up @@ -444,7 +444,7 @@ This test checks that the resources are switched to the passive node if the acti

We're assuming that central node 1 is the active central node and central node 2 is the passive central node ([check the state of the cluster](#how-do-i-know-the-state-of-the-cluster) if you need to).

To perform this test, run the commands on the active central node:
To perform this test, run the commands on the **active central node**:

```bash
iptables -A INPUT -s @CENTRAL_NODE2_IPADDR@ -j DROP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ You need to take action and fix the problem so that the central node that failed

* Central node 1 is still the passive node: the cluster does **not** switch back automatically.
* If you are using EL8 or Debian, you need to clear manually the constraint created by the failover (using `pcs resource clear centreon`).
* In a production context, you do not **have** to go back to central node 1 being the active node - but you can do it if you want to (e.g. if central node 2 has limited performance), by [performing a failover](../../administration/centreon-ha/acceptance-guide.md#perform-a-failover) on central node 2.
* In a production context, you do not **have** to go back to central node 1 being the active node - but you can do it if you want to (e.g. if central node 2 has limited performance), by [performing a failover](../../administration/centreon-ha/operating-guide.md#how-to-perform-a-manual-failover) on central node 2.
Loading