Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-3684] Reinitialise raft #611

Open
wants to merge 103 commits into
base: main
Choose a base branch
from
Open

Conversation

dragomirp
Copy link
Contributor

@dragomirp dragomirp commented Sep 3, 2024

Syncobj RAFT implementation, used as a standalone DCS for Patroni, cannot elect a leader if the cluster loses quorum and becomes read only. This will prevent Patroni from automatically switching over, even in cases where sync_standbys are available in the cluster and could take over as primary.

The PR adds logic to detect when the RAFT cluster becomes read only and to reinitialise it, if a sync_standby is available to become a primary.

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

Attention: Patch coverage is 82.18391% with 31 lines in your changes missing coverage. Please review.

Project coverage is 72.58%. Comparing base (b8c4846) to head (6ffaf17).

Files with missing lines Patch % Lines
src/charm.py 78.37% 15 Missing and 9 partials ⚠️
src/cluster.py 88.88% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #611      +/-   ##
==========================================
+ Coverage   71.82%   72.58%   +0.75%     
==========================================
  Files          13       13              
  Lines        3219     3392     +173     
  Branches      477      525      +48     
==========================================
+ Hits         2312     2462     +150     
- Misses        791      806      +15     
- Partials      116      124       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dragomirp dragomirp force-pushed the dpe-3684-reinitialise-raft branch from 4c7e201 to 29aaf18 Compare September 3, 2024 19:36
@dragomirp dragomirp force-pushed the dpe-3684-reinitialise-raft branch from e69800c to adbf7eb Compare September 10, 2024 23:56
Comment on lines +562 to +564
for unit in self._peers.units:
self._add_to_members_ips(self._get_unit_ip(unit))
self._add_to_members_ips(self._get_unit_ip(self.unit))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the primary and leader are different, cluster will be unable to reconfigure, since the leader patroni is down and outside the cluster, so we have to keep the list here.

@nobuto-m
Copy link

nobuto-m commented Oct 24, 2024

@taurus-forever gave me a charm branch (14/edge/pr611) of testing this in the context of the following issue.
#571

However, it's not working for me. Could you please take a look into my steps if I'm doing something wrong or a patch could be incomplete?

postgresql_debug.log

unit-postgresql-2: 08:25:06 INFO juju.worker.uniter found queued "leader-elected" hook
unit-postgresql-2: 08:25:10 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:15 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:20 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:26 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:31 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:31 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:05 WARNING unit.postgresql/2.juju-log database-peers:0: Remove raft member: Stuck raft cluster detected
unit-postgresql-2: 08:27:05 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft not yet detected on all units
unit-postgresql-2: 08:27:10 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:15 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:20 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:25 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:30 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:30 INFO juju.worker.uniter.operation ran "database-peers-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:31 INFO juju.worker.uniter.operation ran "restart-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:32 INFO juju.worker.uniter.operation ran "upgrade-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:33 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft not yet detected on all units
unit-postgresql-2: 08:27:33 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:49 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:32:12 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:35 INFO juju.worker.uniter.operation ran "database-peers-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:36 INFO juju.worker.uniter.operation ran "restart-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:37 INFO juju.worker.uniter.operation ran "upgrade-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:49 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

Steps

  1. juju deploy postgresql -n 3 --channel 14/edge/pr611 --base [email protected]
  2. check the topology
$ sudo -u snap_daemon patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
+ Cluster: postgresql (7429252738803380683) -+-----------+----+-----------+
| Member         | Host       | Role         | State     | TL | Lag in MB |
+----------------+------------+--------------+-----------+----+-----------+
| postgresql-1   | 10.0.9.118 | Leader       | running   |  1 |           |
| + postgresql-0 | 10.0.9.67  | Sync Standby | streaming |  1 |         0 |
| + postgresql-2 | 10.0.9.77  | Replica      | streaming |  1 |         0 |
+----------------+------------+--------------+-----------+----+-----------+
  1. take down two nodes including the leader
lxc stop -f juju-c3fd60-1
lxc stop -f juju-c3fd60-0
  1. check the status
$ juju status
Model        Controller  Cloud/Region         Version  SLA          Timestamp
postgres-pr  localhost   localhost/localhost  3.5.4    unsupported  08:24:32Z

App         Version  Status  Scale  Charm       Channel        Rev  Exposed  Message
postgresql  14.13    active    1/3  postgresql  14/edge/pr611  504  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   unknown   lost   0        10.0.9.67       5432/tcp  agent lost, see 'juju show-status-log postgresql/0'
postgresql/1*  unknown   lost   1        10.0.9.118      5432/tcp  agent lost, see 'juju show-status-log postgresql/1'
postgresql/2   active    idle   2        10.0.9.77       5432/tcp  

Machine  State    Address     Inst id        Base          AZ  Message
0        down     10.0.9.67   juju-c3fd60-0  [email protected]      Running
1        down     10.0.9.118  juju-c3fd60-1  [email protected]      Running
2        started  10.0.9.77   juju-c3fd60-2  [email protected]      Running
$ sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
2024-10-24 08:26:12,698 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-10-24 08:26:17,946 - INFO - waiting on raft
2024-10-24 08:26:22,947 - INFO - waiting on raft
2024-10-24 08:26:27,948 - INFO - waiting on raft
^C
Aborted!

Both are expected.

  1. Remove units one by one
$ juju remove-unit postgresql/0 --force
WARNING This command will perform the following actions:
will remove unit postgresql/0
- will remove storage pgdata/0

Continue [y/N]? y
$ juju remove-unit postgresql/1 --force
WARNING This command will perform the following actions:
will remove unit postgresql/1
- will remove storage pgdata/1

Continue [y/N]? y
  1. check the status once again
$ juju status
Model        Controller  Cloud/Region         Version  SLA          Timestamp
postgres-pr  localhost   localhost/localhost  3.5.4    unsupported  08:45:42Z

App         Version  Status  Scale  Charm       Channel        Rev  Exposed  Message
postgresql  14.13    active      1  postgresql  14/edge/pr611  504  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/2*  active    idle   2        10.0.9.77       5432/tcp  

Machine  State    Address    Inst id        Base          AZ  Message
2        started  10.0.9.77  juju-c3fd60-2  [email protected]      Running
$ juju run postgresql/leader get-primary
Running operation 3 with 1 task
  - task 4 on unit-postgresql-2

Waiting for task 4...
primary: postgresql/1

^^^ This should return postgresql/2 since that's the only member remaining in the model.

And the raft membership has a problem.

$ sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
2024-10-24 08:39:04,529 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-10-24 08:39:09,759 - INFO - waiting on raft
2024-10-24 08:39:14,759 - INFO - waiting on raft
2024-10-24 08:39:19,760 - INFO - waiting on raft
^C
Aborted!
$ syncobj_admin -conn localhost:2222 -status -pass Eu9IEl4ef4SqV6iL
commit_idx: 269
enabled_code_version: 0
has_quorum: False
last_applied: 269
leader: None
leader_commit_idx: 269
log_len: 2
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_10.0.9.118:2222: 0
partner_node_status_server_10.0.9.67:2222: 0
partner_nodes_count: 2
raft_term: 108
readonly_nodes_count: 0
revision: deprecated
self: 10.0.9.77:2222
self_code_version: 0
state: 1
uptime: 3343
version: 0.3.12

^^^ partner_nodes_count: 2

@taurus-forever
Copy link
Contributor

Dear @nobuto-m , thank you for testing and sharing us feedback!

Please check this the PR description: The PR adds logic to detect when the RAFT cluster becomes read only and to reinitialise it, **if a sync_standby is available** to become a primary.

The reported test case has stopped both Leader and Sync_Standby simultaneously.
Automated promotion in this case is not triggered by design,
as we cannot guarantee consistency for Replica type of the node.

We discussed this case on the last sync call:

  • (recommended) user restores one of the offline node (raft unlocks and old Primary/Sync_Standby will be elected as a new Primary)
  • Patroni/Raft cluster should be re-initialised (manually now). We will create the new force-something action (separate PR). It should be executed as a last resort if user accepts risks of loosing the last transaction(s).
  • in three nodes setup always keep: Primary + 2*Sync_Standby => noticeable write performance impact.
    No other options on the table.

We are going to improve UX and Juju statuses to inform users better about the Replica only left in the cluster.
Do you see other improvements/fixes necessary here? Thank you!

@taurus-forever
Copy link
Contributor

L # Primary  Sync-Standby    Replica
* 0                                   # All OK: RW+RO
* 1                             X     # Degraded mode RW+RO: no changes.
  2               X                   # Degraded mode RW+RO: Replica -> Sync-Standby
  3   X                               # Degraded mode RW+RO: Sync-Standby -> Primary, Replica -> Sync-Standby
* 4               X             X     # Disaster mode: re-add/restore units ASAP
  5   X                         X     # Disaster mode: Sync-Standby -> Primary AFTER juju remove-units (PR #611)
  6   X           X                   # Disaster mode: requires manual confirmation of Replica -> Primary. NEW action.
  7   X           X             X     # Disaster mode: reinstall and restore from backup or reuse Primary/SyncStandby disk

this PR addresses case #5, can you please check it from your side.

@taurus-forever
Copy link
Contributor

taurus-forever commented Oct 24, 2024

Based on the discussion I have re-checked the cluster installation speed and the case 4:

  • The fresh 3-nodes cluster installation on my laptop takes 00:05:12.
    The snap downloading time is close to 1 minute on each unit.

  • Re-tested the case 4: the raft issues detected after juju remove-unit --force:

unit-postgresql-2: 19:07:29 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft has no candidate                         
unit-postgresql-2: 19:07:29 DEBUG unit.postgresql/2.juju-log database-peers:0: Early exit on_peer_relation_changed: stuck raft recovery  

but for some reason the raft re-initialization didn't happen in full.

@dragomirp can you please check the case 4 in #611 (comment) and make sure the same logic applied here as in case 5. Tnx!

  • as for the case 6: we will approve the dedicated juju action name and prepare the separate PR/commit in the same GH issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants