K8SPS-204: don't depend on orchestrator for topology discovery #339

pooknull · 2023-03-29T21:06:56Z

https://jira.percona.com/browse/K8SPS-204

DESCRIPTION

Problem:
We don't want to use an orchestrator for topology discovery, as it will be one point of failure.

Solution:
We should improve the pkg/mysql/topology package and use it instead of sending requests to the orchestrator.

Changes:

removed build/haproxy_check_primary.sh and build/haproxy_check_replicas.sh. Added cmd/haproxy-check/main.go binary which will use the new topology package
cmd/bootstrap will use the new topology package
added TOPOLOGY_EXPERIMENTAL env variable to the operator. Setting it to true will force the operator to discover the topology without the orchestrator
added self-healing-chaos and gr-self-healing-chaos tests

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are the manifests (crd/bundle) regenerated if needed?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported PS version?
Does the change support oldest and newest supported Kubernetes version?

https://jira.percona.com/browse/K8SPS-204

e2e-tests/functions

egegunes · 2023-07-27T08:16:09Z

The idea of TOPOLOGY_EXPERIMENTALis, if it's not enabled we should preserve the old behavior. As I see, we use new experimental topology manager in bootstrap, in haproxy checks even if it's not enabled.
We shouldn't delete orc-handler. The idea is not depending on orchestrator for discovery, it's still the orchestrator that manages async topology and it should be the orchestrator that labels the primary pod.
I see new tests are failing, you probably need to drop finalizers as the last step like we do in other tests.

…-topology

egegunes · 2023-08-04T07:37:29Z

please fix conflicts
orc-handler should come back (right now ps-init-entrypoint.sh fails because we still try to install it even tough we removed from Dockerfile)
bootstrap fails on async replication

first pod goes into CrashLoopBackOff with the following error:

2023/08/04 07:32:54 Peers: [3237353861636563.cluster2-mysql-unready.ps]
2023/08/04 07:32:54 FQDN: cluster2-mysql-0.cluster2-mysql.ps
2023/08/04 07:32:54 Primary: cluster2-mysql-0.cluster2-mysql.ps Replicas: []
2023/08/04 07:32:54 lookup cluster2-mysql-0 [10.28.0.8]
2023/08/04 07:32:54 PodIP: 10.28.0.8
2023/08/04 07:32:54 bootstrap finished in 0.039863 seconds
2023/08/04 07:32:54 bootstrap failed: get primary IP: lookup cluster2-mysql-0.cluster2-mysql.ps: lookup cluster2-mysql-0.cluster2-mysql.ps on 10.24.112.10:53: no such host
2023/08/04 07:33:14 Peers: [3237353861636563.cluster2-mysql-unready.ps]
2023/08/04 07:33:14 FQDN: cluster2-mysql-0.cluster2-mysql.ps
2023/08/04 07:33:14 Primary: cluster2-mysql-0.cluster2-mysql.ps Replicas: []
2023/08/04 07:33:14 lookup cluster2-mysql-0 [10.28.0.8]
2023/08/04 07:33:14 PodIP: 10.28.0.8
2023/08/04 07:33:14 bootstrap finished in 0.071393 seconds
2023/08/04 07:33:14 bootstrap failed: get primary IP: lookup cluster2-mysql-0.cluster2-mysql.ps: lookup cluster2-mysql-0.cluster2-mysql.ps on 10.24.112.10:53: no such host

P.S.: During bootstrap pod is not ready, therefore cluster2-mysql-0.cluster2-mysql.ps is not resolvable.

…-topology

JNKPercona · 2023-08-09T13:28:13Z

Test name	Status
async-ignore-annotations	passed
auto-config	passed
config	passed
config-router	passed
demand-backup	passed
gr-bootstrap	passed
gr-demand-backup	passed
gr-haproxy	failure
gr-ignore-annotations	passed
gr-init-deploy	passed
gr-one-pod	passed
gr-scaling	passed
gr-tls-cert-manager	passed
gr-self-healing-chaos	failure
haproxy	passed
init-deploy	passed
limits	passed
monitoring	passed
one-pod	passed
scaling	passed
service-per-pod	passed
sidecars	passed
tls-cert-manager	passed
users	passed
version-service	passed
self-healing-chaos	passed
We run 26 out of 26

commit: 3a8483b
image: perconalab/percona-server-mysql-operator:PR-339-3a8483b

pull-request-size bot added the size/XL 500-999 lines label Mar 29, 2023

pooknull marked this pull request as ready for review April 4, 2023 07:44

pooknull requested review from tplavcic, nmarukovich, cap1984, hors, egegunes and inelpandzic as code owners April 4, 2023 07:44

pooknull force-pushed the dev/K8SPS-204-improve-topology branch from b4a5330 to cc640a9 Compare July 10, 2023 13:27

pooknull added 10 commits July 17, 2023 14:40

K8SPS-204: don't depend on orchestrator for topology discovery

f5efd6d

https://jira.percona.com/browse/K8SPS-204

fix haproxy-check

8cac41e

remove redundant orchestrator.Discover call

c0cdaeb

fix topology

3f8ed09

increase sleep time in users test

024b9f7

add unready pods to topology when starting replication

91651a2

fix golangci-lint warnings

50c0e63

add self-healin-chaos test

e656c43

fix rebase

39a4892

add self-healing-chaos test to csv

a6e616a

pooknull force-pushed the dev/K8SPS-204-improve-topology branch from e2320fe to a6e616a Compare July 19, 2023 19:24

pull-request-size bot added size/XXL 1000+ lines and removed size/XL 500-999 lines labels Jul 19, 2023

github-actions bot reviewed Jul 19, 2023

View reviewed changes

e2e-tests/functions Outdated Show resolved Hide resolved

pooknull added 5 commits July 21, 2023 11:33

add TOPOLOGY_EXPERIMENTAL env var to operator

9670cc5

fix gr

8283cb4

increase timeout

27c7b3f

add gr-self-healing-chaos test

2bc5e75

Merge branch 'main' into dev/K8SPS-204-improve-topology

b887ca6

drop finalizers for chaos tests

4a4cb09

pooknull added 3 commits July 27, 2023 15:07

add TOPOLOGY_EXPERIMENTAL for sidecars

5ea4793

return back the orc-handler

aabef92

Merge remote-tracking branch 'origin/main' into dev/K8SPS-204-improve…

3d2b994

…-topology

pooknull added 4 commits August 8, 2023 00:05

Merge remote-tracking branch 'origin/main' into dev/K8SPS-204-improve…

1835e1b

…-topology

return orc-handler to Dockerfile

e1f3d35

fix bootstrap

1d1f7b9

fix haproxy-check

3a8483b

tplavcic mentioned this pull request Aug 17, 2023

K8SPS-288 - async self healing test #428

Draft

11 tasks

pooknull mentioned this pull request Jan 10, 2024

K8SPS-266: Refactor Replicator interface #517

Merged

11 tasks

hors closed this Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SPS-204: don't depend on orchestrator for topology discovery #339

K8SPS-204: don't depend on orchestrator for topology discovery #339

pooknull commented Mar 29, 2023 •

edited by jira bot

Loading

egegunes commented Jul 27, 2023

egegunes commented Aug 4, 2023

JNKPercona commented Aug 9, 2023

K8SPS-204: don't depend on orchestrator for topology discovery #339

K8SPS-204: don't depend on orchestrator for topology discovery #339

Conversation

pooknull commented Mar 29, 2023 • edited by jira bot Loading

DESCRIPTION

CHECKLIST

egegunes commented Jul 27, 2023

egegunes commented Aug 4, 2023

JNKPercona commented Aug 9, 2023

pooknull commented Mar 29, 2023 •

edited by jira bot

Loading