Skip to content
This repository has been archived by the owner on May 2, 2023. It is now read-only.

!! Leader Election Issue in Schema Registry, On leader deletion. #926

Open
Shivam0609 opened this issue Jan 12, 2023 · 1 comment
Open

Comments

@Shivam0609
Copy link

Shivam0609 commented Jan 12, 2023

Problem Description :

When running multiple replicas of Schema registry for HA, 1 pod is elected as leader while other are workers.
When we are deleting leader pod then it is electing one of the other pod as a new leader.
But when new pod scales up , again new pod is elected as leader which causes few seconds of downtime and results in failure of schemas creation while rebalancing. Seems rebalancing is happenning twice.

Expected behaviour :

When running in HA , i.e, multiple replicas of schema registry. Deleting leader pod should elect new leader only once and process further requests without switching leader untill it's deleted again.

Steps to Reproduce :

  1. Run multiple replicas of Schema registry using Deployent/Statefulset.
  2. Delete leader pod and immediately create multiple schemas.
  3. Failure is observed for some of the schemas creation.

Logs

kubectl get po -n kafka -o wide
NAME                        READY   STATUS     IP        
cp-schema-registry-0        1/1     Running    10.0.7.153
cp-schema-registry-1        1/1     Running    10.0.8.175
cp-schema-registry-2        1/1     Running    10.0.32.182 (Current Leader)

kubectl logs cp-schema-registry-0 -n kafka | grep leader
[2023-01-12 06:41:55,177] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-6f6c7d7c-e739-400b-9904-620bc282350c', leaderIdentity=version=1,host=10.0.32.182,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)

Deleted Leader Pod, New pod scales up with ip (10.0.36.86)

kubectl get po -n kafka -o wide
NAME                        READY   STATUS    RESTARTS   AGE     IP        
cp-schema-registry-0        1/1     Running   0          21h     10.0.7.153
cp-schema-registry-1        1/1     Running   0          6m39s   10.0.8.175
cp-schema-registry-2        1/1     Running   0          36s     10.0.36.86

Logs for leader election

[2023-01-12 06:40:55,135] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:41:55,177] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-6f6c7d7c-e739-400b-9904-620bc282350c', leaderIdentity=version=1,host=10.0.32.182,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
.....
..
.
Caused by: io.confluent.kafka.schemaregistry.exceptions.UnknownLeaderException: Register schema request failed since leader is unknown
....
..
###### form below you can see existing pod is elected as leader (10.0.7.153- cp-schema-registry-0)
io.confluent.kafka.schemaregistry.rest.exceptions.RestRequestForwardingException: Error while forwarding register schema request to the leader
[2023-01-12 06:44:25,188] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:44:25,216] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-a5c21494-c85b-4296-9471-7e7c523c3178', leaderIdentity=version=1,host=10.0.7.153,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
....
..
.
##### Once new pod is up and running again rebalancing happened (10.0.36.86- cp-schema-registry-2) is elected as leader
[2023-01-12 06:44:46,213] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:44:46,218] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-bf1e2a4d-a3f0-4817-a856-87cd1aaab60d', leaderIdentity=version=1,host=10.0.36.86,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
...
..
.

Sometime leader is not even gets elected until new pod is up and running.

Additional Information :

Image Used : 6.2.5

Kubernetes cluster : AWS EKS

Kubernetes version :

root:~ kubectl version --short

Client Version: v1.23.9
Server Version: v1.23.13-eks-fb459a0
@Shivam0609 Shivam0609 changed the title Leader Election !! Leader Election Issue in Schema Registry, On leader deletion. Jan 12, 2023
@Shivam0609
Copy link
Author

Any update on this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant