Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-Gateway pods is Init state and only changes to running state once we delete the service api-gateway. #3934

Open
tejnar opened this issue Apr 22, 2024 · 2 comments
Labels
type/bug Something isn't working

Comments

@tejnar
Copy link

tejnar commented Apr 22, 2024

Question

API-Gateway pods is Init state and only changes to running state only once we delete the svc api-gateway.

CLI Commands (consul-k8s, consul-k8s-control-plane, helm)

Helm Configuration

Attached values.yaml file which is being used for deploying consul to EKS.
values.yaml.txt

Chart Details:

version: 1.4.1
appVersion: 1.18.1

Logs

[api-gw]$ kubectl -n mesh-client get all
NAME READY STATUS RESTARTS AGE
pod/api-gateway-68bcd79b4d-p62bm 0/1 Init:CrashLoopBackOff 5 (2m24s ago) 17m
pod/consul-consul-connect-injector-8cf9849c4-ksxd9 1/1 Running 0 30h
pod/consul-consul-connect-injector-8cf9849c4-nm6zq 1/1 Running 0 3d9h
pod/consul-consul-server-0 1/1 Running 0 3d9h
pod/consul-consul-server-1 1/1 Running 0 37h
pod/consul-consul-server-2 1/1 Running 0 3d14h
pod/consul-consul-webhook-cert-manager-5dc74f9bbb-ltssk 1/1 Running 0 30h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/api-gateway LoadBalancer 172.20.137.66 api-gateway.elb.us-east-1.amazonaws.com 80:32370/TCP 111m
service/consul-consul-connect-injector ClusterIP 172.20.22.178 443/TCP 3d14h
service/consul-consul-server ClusterIP None 8500/TCP,8502/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 3d14h
service/consul-consul-ui NodePort 172.20.63.191 80:30904/TCP 3d14h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/api-gateway 0/1 1 0 3d
deployment.apps/consul-consul-connect-injector 2/2 2 2 3d14h
deployment.apps/consul-consul-webhook-cert-manager 1/1 1 1 3d14h

NAME DESIRED CURRENT READY AGE
replicaset.apps/api-gateway-68bcd79b4d 1 1 0 3d
replicaset.apps/consul-consul-connect-injector-8cf9849c4 2 2 2 3d14h
replicaset.apps/consul-consul-webhook-cert-manager-5dc74f9bbb 1 1 1 3d14h

NAME READY AGE
statefulset.apps/consul-consul-server 3/3 3d14h

Current understanding and Expected behavior

We use spot instances in our cluster and api-gateway pod can be migrated to any other node in the cluster.
My expectation is it should bring the api-gateway pod to running state, since there is an associated service(api-gateway) which is already running. I've also defined HTTPRoute as defined in documentation (https://developer.hashicorp.com/consul/tutorials/kubernetes/kubernetes-api-gateway#deploy-api-gateway).

Once I delete the service(api-gateway), it brings the pod to running state and it was working as expected. Also able to get the response from the services deployed inside the eks cluster.

This issue happens only when the service is exposed an loadbalancer and for nodePort it works as expected.

Environment details

EKS version : 1.29 with Calico-cni enabled

Additional Context

Modify the connect-inject-deployment.yaml to use hostNetwork: true

@pawellegowski89
Copy link

pawellegowski89 commented May 8, 2024

After adding a CRD - API gateway and then deleting it in consul, the role remains and adding such a CRD again causes an error.

The error occurs even without any intervention, if we shutdown and up the environment, the API gateway will no longer be running, but will hang on INIT, trying to re-add an existing role, i.e. the same error again:

Reconciler error {"controller": "gateway", "controllerGroup": "gateway.networking.k8s.io", "controllerKind": "Gateway", "Gateway": {"name":"mesh-api-gateway","namespace":"data"}, "namespace": "data", "name": "mesh-api-gateway", "reconcileID": "739cd7fb-540e-46f2-b6dd-653baf933f1a", "error": "Unexpected response code: 500 (Invalid Role: A Role with Name \"managed-gateway-acl-role-mesh-api-gateway\" already exists)"}

Manually removing the role in UI helps, but it is only a workaround

@pawellegowski89
Copy link

In version chart 1.5.3 this works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants