You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
When I create a Service with custom Endpoints using an address outside the AWS IP ranges, the service is unreachable in some unoccasions. So some pods can reach the target, and some can't. This behavior is table. I've found no link between nodes, sometime a pod on a specific node can connect to the Service and another on the same node can't. If I log in on the node, it can reach the Service both using the Service address and the target address in the Enpoints resource. Doing a request from the Pod directly to the target address also succeeds.
Using tcpdump to trace traffic of a failing Pod just shows outgoing SYN packets. I noticed that successful and failing Pods use IP addresses from different interfaces, so I suspect the problem is linked to the Pod IP address, the Service address and the combination of that with the routing tables per interface.
Solving my own issue, but it was kind of hidden in documentation. Leaving this part for other to find:
Apparently in an environment a AWS Direct Connect setup with Kubernetes pods living in private subnets that are also routed to a NAT gateway, then we need to run
kubectl set env daemonset -n kube-system aws-node AWS_VPC_K8S_CNI_EXTERNALSNAT=true
As we have direct routes to our Direct Connect network and Pods were perfectly capable of connecting to real external addresses, I did not think the part "Enable outbound internet access for Pods" applied to our setup. Especially because the Pods that were assoicated with addresses on the primary interface could connect just fine.
What happened:
When I create a Service with custom Endpoints using an address outside the AWS IP ranges, the service is unreachable in some unoccasions. So some pods can reach the target, and some can't. This behavior is table. I've found no link between nodes, sometime a pod on a specific node can connect to the Service and another on the same node can't. If I log in on the node, it can reach the Service both using the Service address and the target address in the Enpoints resource. Doing a request from the Pod directly to the target address also succeeds.
Using tcpdump to trace traffic of a failing Pod just shows outgoing SYN packets. I noticed that successful and failing Pods use IP addresses from different interfaces, so I suspect the problem is linked to the Pod IP address, the Service address and the combination of that with the routing tables per interface.
Attach logs
File name:: eks_i-01840548d3701fc8d_2025-02-17_1018-UTC_0.7.8.tar.gz
What you expected to happen:
Correct connection to external address is in custom Endpoints.
How to reproduce it (as minimally and precisely as possible):
Create a custom Service+Endpoints:
Fire up some Deployment or Daemonset and try to curl to the Service. Some of them will succeed, some of them will fail.
Anything else we need to know?:
Environment:
CNI v1.19.2-eksbuild.5
AMI amazon-eks-node-al2023-x86_64-standard-1.32-v20250203
The text was updated successfully, but these errors were encountered: