Unable to remove duplicate or stale instance entries of a service in Consul catalog when Consul Connect inject enabled pod moves from one node to another. Currently running with an agentless setup. #4219

MageshSrinivasulu · 2024-07-30T16:58:34Z

After upgrading the consul from version 1.14.10 to 1.16.6 using an agentless setup, I noticed duplicate entries of the same instances under service. I am unable to remove them, and one of the entries is orphaned, pointing to a pod that is no longer running in the cluster.

How to resolve it?

MageshSrinivasulu · 2024-07-31T09:11:25Z

Apart from deleting the node that doesn't exist anymore what helps me is scaling down the impacted service to zero and scaling it back up again which removes the duplicate or bad entries. These are just temporary fixes. Issue persists.

Below is how I can consistently reproduce the issue

pod A running in node A.
cordon node A
Let the pod A schedule in node B
It leaves 2 entries of an instance in the consul catalog meaning the pod IP of both old and new pod A. Where the health of new pod A flips between healthy and unhealthy and old pod A entry is always unhealthy

This is Crazy. In Kubernetes pod can move between nodes at any given point in time. If it moves consul must deregister the old entry and create a new one.

MageshSrinivasulu · 2024-07-31T11:16:05Z

@david-yu @blake Can you please guide me on this?

MageshSrinivasulu · 2024-08-01T06:31:17Z

Below is what I found in the connect inject pod logs. I have masked the actual service name

2024-08-01T01:06:32.083Z ERROR controller.endpoints failed to deregister endpoints {"name": "SERVICE", "ns": "NAMESPACE", "error": "2 errors occurred:\n\t* failed to update service health status for pod NAMESPACE/POD to critical: Unexpected response code: 500 (rpc error making call: Unknown service ID 'SERVICE ID' for check ID 'NAMESPACE/SERVICE ID')\n\t* failed to update service health status for pod NAMESPACE/POD to critical: Unexpected response code: 500 (rpc error making call: Unknown service ID 'SERVICE ID-sidecar-proxy' for check ID 'NAMESPACE/SERVICE ID-sidecar-proxy')\n\n"}

MageshSrinivasulu · 2024-08-06T05:42:12Z

This is how I found the working version of consul when trying to upgrade from 1.14.10 to 1.16.6

The nearest working version is 1.15.9. All the versions from 1.15.10 to 1.16.6 have one issue or another it is not stable and results are not consistent

I was able to deploy the agentless feature of consul with 1.15.9 and didn't observer any major issues

The issue mentioned below is predominant in the 1.16 release

hashicorp/consul#19717

I kindly request to give some attention to this issue

MageshSrinivasulu added the type/bug Something isn't working label Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to remove duplicate or stale instance entries of a service in Consul catalog when Consul Connect inject enabled pod moves from one node to another. Currently running with an agentless setup. #4219

Unable to remove duplicate or stale instance entries of a service in Consul catalog when Consul Connect inject enabled pod moves from one node to another. Currently running with an agentless setup. #4219

MageshSrinivasulu commented Jul 30, 2024

MageshSrinivasulu commented Jul 31, 2024 •

edited

Loading

MageshSrinivasulu commented Jul 31, 2024 •

edited

Loading

MageshSrinivasulu commented Aug 1, 2024

MageshSrinivasulu commented Aug 6, 2024 •

edited

Loading

Unable to remove duplicate or stale instance entries of a service in Consul catalog when Consul Connect inject enabled pod moves from one node to another. Currently running with an agentless setup. #4219

Unable to remove duplicate or stale instance entries of a service in Consul catalog when Consul Connect inject enabled pod moves from one node to another. Currently running with an agentless setup. #4219

Comments

MageshSrinivasulu commented Jul 30, 2024

MageshSrinivasulu commented Jul 31, 2024 • edited Loading

MageshSrinivasulu commented Jul 31, 2024 • edited Loading

MageshSrinivasulu commented Aug 1, 2024

MageshSrinivasulu commented Aug 6, 2024 • edited Loading

MageshSrinivasulu commented Jul 31, 2024 •

edited

Loading

MageshSrinivasulu commented Jul 31, 2024 •

edited

Loading

MageshSrinivasulu commented Aug 6, 2024 •

edited

Loading