Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SPS-359: Fix rebooting cluster if it's partially online #725

Merged
merged 4 commits into from
Aug 19, 2024
Merged

Conversation

egegunes
Copy link
Contributor

@egegunes egegunes commented Aug 13, 2024

K8SPS-359 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem:
If 2 out of 3 MySQL pods are terminated abruptly, operator fails to reboot the cluster (sporadically).

Cause:
Even though failed pods go into crash recovery mode once they come up again, MySQL complains about cluster being already ONLINE when operator tries to run dba.rebootClusterFromCompleteOutage().

Solution:
If operator can't run dba.rebootClusterFromCompleteOutage() because cluster is already online, delete all MySQL pods to force recovery.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PS version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size bot added the size/L 100-499 lines label Aug 13, 2024
Comment on lines +579 to +581
"SELECT MEMBER_HOST FROM performance_schema.replication_group_members where MEMBER_ROLE='PRIMARY';" \
"-h $(get_mysql_router_service $(get_cluster_name)) -P 6446 -uroot -proot_password" \
| cut -d'.' -f1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[shfmt] reported by reviewdog 🐶

Suggested change
"SELECT MEMBER_HOST FROM performance_schema.replication_group_members where MEMBER_ROLE='PRIMARY';" \
"-h $(get_mysql_router_service $(get_cluster_name)) -P 6446 -uroot -proot_password" \
| cut -d'.' -f1
"SELECT MEMBER_HOST FROM performance_schema.replication_group_members where MEMBER_ROLE='PRIMARY';" \
"-h $(get_mysql_router_service $(get_cluster_name)) -P 6446 -uroot -proot_password" \
| cut -d'.' -f1

deploy/rbac.yaml Outdated
@@ -80,6 +78,20 @@ rules:
- patch
- update
- watch
- apiGroups:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need those in cw-rbac.yaml for cluster-wide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JNKPercona
Copy link
Collaborator

Test name Status
version-service passed
async-ignore-annotations passed
auto-config passed
config passed
config-router passed
demand-backup passed
gr-demand-backup passed
gr-demand-backup-haproxy passed
gr-finalizer passed
gr-haproxy passed
gr-ignore-annotations passed
gr-init-deploy passed
gr-one-pod passed
gr-recreate passed
gr-scaling passed
gr-scheduled-backup passed
gr-security-context passed
gr-self-healing passed
gr-tls-cert-manager passed
gr-users passed
haproxy passed
init-deploy passed
limits passed
monitoring passed
one-pod passed
operator-self-healing passed
recreate passed
scaling passed
scheduled-backup passed
service-per-pod passed
sidecars passed
smart-update passed
tls-cert-manager passed
users passed
We run 34 out of 34

commit: 75bb02d
image: perconalab/percona-server-mysql-operator:PR-725-75bb02dd

@hors hors merged commit d17dc59 into main Aug 19, 2024
16 checks passed
@hors hors deleted the K8SPS-359 branch August 19, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L 100-499 lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants