Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE1: failed to setup cluster with calico network as it failed to check etcd health: failed to get /health for host 10.200.0.2 #3758

Open
syedsalman3753 opened this issue Dec 11, 2024 · 0 comments

Comments

@syedsalman3753
Copy link

RKE version: v1.3.10

Docker version: (docker version,docker info preferred)

  $ docker version
    Client:
     Version:           20.10.21
     API version:       1.41
     Go version:        go1.18.1
     Git commit:        20.10.21-0ubuntu1~20.04.2
     Built:             Thu Apr 27 05:56:19 2023
     OS/Arch:           linux/amd64
     Context:           default
     Experimental:      true
    
    Server:
     Engine:
      Version:          20.10.21
      API version:      1.41 (minimum version 1.12)
      Go version:       go1.18.1
      Git commit:       20.10.21-0ubuntu1~20.04.2
      Built:            Thu Apr 27 05:37:01 2023
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.7.12
      GitCommit:        
     runc:
      Version:          1.1.12-0ubuntu2~20.04.1
      GitCommit:        
     docker-init:
      Version:          0.19.0
      GitCommit:        

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

kernel version

mosipuser@VLAN2002:~$ uname -r
5.4.0-146-generic

Type/provider of hosts: Bare-metal / KVM is used for VM creation

cluster.yml file:

  # If you intended to deploy Kubernetes in an air-gapped environment,
  # please consult the documentation on how to configure custom RKE images.
  nodes:
  - address: 10.200.0.2
    port: "22"
    internal_address: 10.200.0.2
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node1
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []

  - address: 10.200.0.3
    port: "22"
    internal_address: 10.200.0.3
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node2
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []

  - address: 10.100.0.2
    port: "22"
    internal_address: 10.100.0.2
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node3
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []

  - address: 10.100.0.3
    port: "22"
    internal_address: 10.100.0.3
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node4
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []

  - address: 10.101.0.2
    port: "22"
    internal_address: 10.101.0.2
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node5
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []

  - address: 10.101.0.3
    port: "22"
    internal_address: 10.101.0.3
    role:
    - controlplane
    - worker
    - etcd
    hostname_override: node6
    user: mosipuser
    docker_socket: /var/run/docker.sock
    ssh_key: ""
    ssh_key_path: /home/mosipuser/.ssh/id_rsa
    ssh_cert: ""
    ssh_cert_path: ""
    labels: {}
    taints: []


  services:
    etcd:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
      external_urls: []
      ca_cert: ""
      cert: ""
      key: ""
      path: ""
      uid: 0
      gid: 0
      snapshot: null
      retention: ""
      creation: ""
      backup_config: null
    kube-api:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
      service_cluster_ip_range: 10.43.0.0/16
      service_node_port_range: ""
      pod_security_policy: false
      always_pull_images: false
      secrets_encryption_config: null
      audit_log: null
      admission_configuration: null
      event_rate_limit: null
    kube-controller:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
      cluster_cidr: 10.42.0.0/16
      service_cluster_ip_range: 10.43.0.0/16
    scheduler:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
    kubelet:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
      cluster_domain: cluster.local
      infra_container_image: ""
      cluster_dns_server: 10.43.0.10
      fail_swap_on: false
      generate_serving_certificate: false
    kubeproxy:
      image: ""
      extra_args: {}
      extra_binds: []
      extra_env: []
      win_extra_args: {}
      win_extra_binds: []
      win_extra_env: []
  network:
    plugin: calico
    options: {}
    mtu: 0
    node_selector: {}
    update_strategy: null
    tolerations: []
  authentication:
    strategy: x509
    sans: []
    webhook: null
  addons: ""
  addons_include: []
  system_images:
    etcd: rancher/mirrored-coreos-etcd:v3.5.3
    alpine: rancher/rke-tools:v0.1.80
    nginx_proxy: rancher/rke-tools:v0.1.80
    cert_downloader: rancher/rke-tools:v0.1.80
    kubernetes_services_sidecar: rancher/rke-tools:v0.1.80
    kubedns: rancher/mirrored-k8s-dns-kube-dns:1.17.4
    dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.17.4
    kubedns_sidecar: rancher/mirrored-k8s-dns-sidecar:1.17.4
    kubedns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.3
    coredns: rancher/mirrored-coredns-coredns:1.8.6
    coredns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.5
    nodelocal: rancher/mirrored-k8s-dns-node-cache:1.21.1
    kubernetes: rancher/hyperkube:v1.22.9-rancher1
    flannel: rancher/mirrored-coreos-flannel:v0.15.1
    flannel_cni: rancher/flannel-cni:v0.3.0-rancher6
    calico_node: rancher/mirrored-calico-node:v3.21.1
    calico_cni: rancher/mirrored-calico-cni:v3.21.1
    calico_controllers: rancher/mirrored-calico-kube-controllers:v3.21.1
    calico_ctl: rancher/mirrored-calico-ctl:v3.21.1
    calico_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.21.1
    canal_node: rancher/mirrored-calico-node:v3.21.1
    canal_cni: rancher/mirrored-calico-cni:v3.21.1
    canal_controllers: rancher/mirrored-calico-kube-controllers:v3.21.1
    canal_flannel: rancher/mirrored-flannelcni-flannel:v0.17.0
    canal_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.21.1
    weave_node: weaveworks/weave-kube:2.8.1
    weave_cni: weaveworks/weave-npc:2.8.1
    pod_infra_container: rancher/mirrored-pause:3.6
    ingress: rancher/nginx-ingress-controller:nginx-1.2.0-rancher1
    ingress_backend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
    ingress_webhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
    metrics_server: rancher/mirrored-metrics-server:v0.5.1
    windows_pod_infra_container: rancher/mirrored-pause:3.6
    aci_cni_deploy_container: noiro/cnideploy:5.1.1.0.1ae238a
    aci_host_container: noiro/aci-containers-host:5.1.1.0.1ae238a
    aci_opflex_container: noiro/opflex:5.1.1.0.1ae238a
    aci_mcast_container: noiro/opflex:5.1.1.0.1ae238a
    aci_ovs_container: noiro/openvswitch:5.1.1.0.1ae238a
    aci_controller_container: noiro/aci-containers-controller:5.1.1.0.1ae238a
    aci_gbp_server_container: noiro/gbp-server:5.1.1.0.1ae238a
    aci_opflex_server_container: noiro/opflex-server:5.1.1.0.1ae238a
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert_path: ""
  ssh_agent_auth: false
  authorization:
    mode: rbac
    options: {}
  ignore_docker_version: null
  enable_cri_dockerd: null
  kubernetes_version: ""
  private_registries: []
  ingress:
    provider: none
    options: {}
    node_selector: {}
    extra_args: {}
    dns_policy: ""
    extra_envs: []
    extra_volumes: []
    extra_volume_mounts: []
    update_strategy: null
    http_port: 0
    https_port: 0
    network_mode: ""
    tolerations: []
    default_backend: null
    default_http_backend_priority_class_name: ""
    nginx_ingress_controller_priority_class_name: ""
    default_ingress_class: null
  cluster_name: "test"
  cloud_provider:
    name: ""
  prefix_path: ""
  win_prefix_path: ""
  addon_job_timeout: 0
  bastion_host:
    address: ""
    port: ""
    user: ""
    ssh_key: ""
    ssh_key_path: ""
    ssh_cert: ""
    ssh_cert_path: ""
    ignore_proxy_env_vars: false
  monitoring:
    provider: ""
    options: {}
    node_selector: {}
    update_strategy: null
    replicas: null
    tolerations: []
    metrics_server_priority_class_name: ""
  restore:
    restore: false
    snapshot_name: ""
  rotate_encryption_key: false
  dns: null   

Steps to Reproduce:

  • Attempted to set up a Kubernetes cluster with version v1.22.9 using a multi-VLAN network and the calico network plugin, as the default canal plugin lacks support for network policies.
  • Assigned worker and etcd roles to all nodes, with a subset also configured as controlplane nodes.
  • Cluster creation failed due to an ETCD health check error.
  • ETCD logs: etcd.log

Results:

  • Error:
    INFO[0092] [etcd] Successfully started etcd plane.. Checking etcd cluster health 
    WARN[0374] [etcd] host [10.200.0.2] failed to check etcd health: failed to get /health for host [10.200.0.2]: Get "https://10.200.0.2:2379/health": net/http: TLS handshake timeout 
    WARN[0656] [etcd] host [10.200.0.3] failed to check etcd health: failed to get /health for host [10.200.0.3]: Get "https://10.200.0.3:2379/health": net/http: TLS handshake timeout 
    WARN[0930] [etcd] host [10.100.0.2] failed to check etcd health: failed to get /health for host [10.100.0.2]: Get "https://10.100.0.2:2379/health": net/http: TLS handshake timeout 
    WARN[1203] [etcd] host [10.100.0.3] failed to check etcd health: failed to get /health for host [10.100.0.3]: Get "https://10.100.0.3:2379/health": net/http: TLS handshake timeout 
    WARN[1477] [etcd] host [10.101.0.2] failed to check etcd health: failed to get /health for host [10.101.0.2]: Get "https://10.101.0.2:2379/health": net/http: TLS handshake timeout 
    
    FATA[1750] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.200.0.2,10.200.0.3,10.100.0.2,10.100.0.3,10.101.0.2,10.101.0.3] failed to report healthy. 
    Check etcd container logs on each host for more information 
    
    • If we choose canal network, cluster is up
    • Need help to resolve the issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant