Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

Clients connect to unready endpoints #20

Open
TomHutter opened this issue Jun 12, 2018 · 1 comment
Open

Clients connect to unready endpoints #20

TomHutter opened this issue Jun 12, 2018 · 1 comment
Assignees

Comments

@TomHutter
Copy link

Hi everybody,

due to the annotation: service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" in the service, it seems to me, that the service is distributing requests to nodes, even they are not ready. This leads to connection or SQL errors, when client requests are distributed to nodes which are shutting down.
Therefore I created another service, which has no annotation and use this service for the clients to connect:

# create a service for clients which honors readiness
apiVersion: v1
kind: Service
metadata:
  name: "{{ template "dnsname" . }}-service"
  labels:
    app: {{ template "fullname" . }}
    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    release: "{{ .Release.Name }}"
    heritage: "{{ .Release.Service }}"
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: {{ template "fullname" . }}

Additionally I modified the readiness probe to check for a semaphore file and if the node is in sync:

#!/bin/bash
#
# Adfinis SyGroup AG
# openshift-mariadb-galera: mysqld readinessProbe
#

MYSQL_USER="readinessProbe"
MYSQL_PASS="readinessProbe"
MYSQL_HOST="localhost"

if [ -f "/tmp/wsrep_off" ];then
  exit 1
fi

mysql --protocol=socket --socket=/var/run/mysqld/mysqld.sock -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e"SHOW DATABASES;"

if [ $? -ne 0 ]; then
  exit 1
fi

SYNCED=$( mysql -s --skip-column-names --protocol=socket --socket=/var/run/mysqld/mysqld.sock -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e"SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';" | awk '{ print $2 }' )

if [ "${SYNCED}" != "Synced" ];then
  exit 1
else
  exit 0
fi

Then I added a pre_stop command to the stateful set, increased the terminationGracePeriodSeconds to 60, to give the nodes enough time to shut down and set the frequency of the readinessProbe to 10 seconds:

....
terminationGracePeriodSeconds: 60
....
     containers:
        lifecycle:
         preStop:
            exec:
              command:
                - /bin/sh
                - -c
                - touch /tmp/wsrep_off && sleep 20
...
       readinessProbe:
          exec:
            command:
            - /usr/share/container-scripts/mysql/readiness-probe.sh
          timeoutSeconds: 5
          periodSeconds: 10
          failureThreshold: 1

Now the nodes themselves can connect to each other over the galera-mdb-ga service, which tolerates not ready nodes and the clients can connect to the nodes over galera-mdb-ga-service, which distributes requests only to ready nodes.

@tongpu
Copy link
Member

tongpu commented Jun 12, 2018

The galera service is required by the StatefulSet and shall not be used to let clients connect. So creating a second service for client access is the right way to go.

We've also discussed updating the readinessProbe in #8, but didn't yet work on it. A PR with your changes would be very much appreciated.

@tongpu tongpu self-assigned this Jun 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants