Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service with custom Endpoints is unreachable in some cases. #3205

Open
robdewit opened this issue Feb 17, 2025 · 1 comment
Open

Service with custom Endpoints is unreachable in some cases. #3205

robdewit opened this issue Feb 17, 2025 · 1 comment
Labels

Comments

@robdewit
Copy link

What happened:
When I create a Service with custom Endpoints using an address outside the AWS IP ranges, the service is unreachable in some unoccasions. So some pods can reach the target, and some can't. This behavior is table. I've found no link between nodes, sometime a pod on a specific node can connect to the Service and another on the same node can't. If I log in on the node, it can reach the Service both using the Service address and the target address in the Enpoints resource. Doing a request from the Pod directly to the target address also succeeds.

Using tcpdump to trace traffic of a failing Pod just shows outgoing SYN packets. I noticed that successful and failing Pods use IP addresses from different interfaces, so I suspect the problem is linked to the Pod IP address, the Service address and the combination of that with the routing tables per interface.

Attach logs
File name:: eks_i-01840548d3701fc8d_2025-02-17_1018-UTC_0.7.8.tar.gz

What you expected to happen:
Correct connection to external address is in custom Endpoints.

How to reproduce it (as minimally and precisely as possible):
Create a custom Service+Endpoints:

apiVersion: v1
kind: Service
metadata:
  name: TEST
  namespace: ns1
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
---
apiVersion: v1
kind: Endpoints
metadata:
  name: TEST
  namespace: ns1
subsets:
- addresses:
  - ip: EXTERNAL_ADDRESS
  ports:
  - port: 443
    protocol: TCP
    name: https

Fire up some Deployment or Daemonset and try to curl to the Service. Some of them will succeed, some of them will fail.

Anything else we need to know?:

  • iptables config created by kube-proxy is identical on all nodes.
  • there are no blocking Securitygroups or routing acls, also proven by the fact that some Pods succeed and connection directly from the nodes succeed.
  • As nodes are running longer, the problem seems to increase.

Environment:
CNI v1.19.2-eksbuild.5
AMI amazon-eks-node-al2023-x86_64-standard-1.32-v20250203

@robdewit robdewit added the bug label Feb 17, 2025
@robdewit
Copy link
Author

Solving my own issue, but it was kind of hidden in documentation. Leaving this part for other to find:

Apparently in an environment a AWS Direct Connect setup with Kubernetes pods living in private subnets that are also routed to a NAT gateway, then we need to run

kubectl set env daemonset -n kube-system aws-node AWS_VPC_K8S_CNI_EXTERNALSNAT=true

As documented here: https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html

As we have direct routes to our Direct Connect network and Pods were perfectly capable of connecting to real external addresses, I did not think the part "Enable outbound internet access for Pods" applied to our setup. Especially because the Pods that were assoicated with addresses on the primary interface could connect just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant