2 node HA with OVN DB Availability #4631

kannanvr · 2024-10-17T14:18:42Z

Kube-OVN Version

1.12.26

Kubernetes Version

1.27.10

Operation-system/Kernel Version

5.15.0

Description

We have a setup like 2 node HA kubernetes. We are not using etcd to bring up the cluster. Rather we are using the postgres with kine to achieve HA with 2 node. We have a keepalived component with VIP which help us to promote the master postgres DB when some reboot happens.

Now when we install kube-ovn with this 2 node HA k8s setup, we are facing the issue. When any of the node goes down OVN-DB/OVN-CEBTRAL NB and SB database is not coming up. Hence OVN central pod is restarting and the new pod on 1 available node doesn't get any IP address.

It seems OVN-DB/OVN-CENTRAL is using raft consensus algorithm. Is it possible to use active passive way of connection ? Or any way to solve this case with VIP from keepalived.

Please help us to provide your valuable feedback on the issue.

Steps To Reproduce

Install 2 node HA
Shutdown one of the node
Bring up the new pod
It doesn't get ip

Current Behavior

When node goes down , new pod is not allocated with IP due to DB ISSUE

Expected Behavior

When node goes down , new pod is allocated with IP due to DB ISSUE

dosubot · 2024-10-17T14:21:58Z

Hey @kannanvr! I'm here to assist you with any bugs, questions, or contributions. Let's work together to resolve your issue.

I couldn't find specific information on using an active-passive configuration for OVN databases in Kube-OVN. However, Kube-OVN handles high availability for the OVN databases using a script that monitors the status of the databases and manages leadership roles. This script checks the status of the OVN Northbound (NB) and Southbound (SB) databases, determines the leader, and ensures data consistency by managing the service endpoint and Kubernetes labels ^[1]^[2].

For diagnosing issues with OVN-DB availability, Kube-OVN provides specific logs and debugging tools. You can use commands to check the status of the OVN Southbound DB and get the OVN Central DB status. Additionally, the ovs-healthcheck.sh script can verify the connection and status of the OVN databases and offer corrective actions for common issues ^[3]^[4].

Regarding the use of keepalived and VIPs, there was a similar issue where pods on other nodes could not access the keepalived VIP after a node shutdown. The suggested solution was to use tcpdump to trace the packet and ensure that the VIP is treated as an external address, with packets being SNATed to the node address to visit the VIP ^[5].

You might want to explore these debugging tools and configurations to address the issue you're facing.

To continue talking to Dosu, mention @dosu.

oilbeater · 2024-10-21T06:38:06Z

To change to active-backup mode, significant modifications are required for the current ovn-central startup and deployment. We do not recommend this approach. If your scenario necessitates achieving high availability in a two-node setup, you can refer to the ovsdb documentation for modifications: OVSDB Documentation.

kannanvr added the bug Something isn't working label Oct 17, 2024

dosubot bot added the ipam label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 node HA with OVN DB Availability #4631

2 node HA with OVN DB Availability #4631

kannanvr commented Oct 17, 2024

dosubot bot commented Oct 17, 2024

oilbeater commented Oct 21, 2024

2 node HA with OVN DB Availability #4631

2 node HA with OVN DB Availability #4631

Comments

kannanvr commented Oct 17, 2024

Kube-OVN Version

Kubernetes Version

Operation-system/Kernel Version

Description

Steps To Reproduce

Current Behavior

Expected Behavior

dosubot bot commented Oct 17, 2024

oilbeater commented Oct 21, 2024