-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods with security groups cannot resolve/too slow to resolve domain names #3126
Comments
does this only happen during intial phase of pod creation or is it consistent? Would really help if you generate a log bundle and send us email provided here https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#collecting-node-level-tech-support-bundle-for-offline-troubleshooting. Thanks |
Thanks for answering,
it is consistent, as in, it doesn't get better over time. Pods that have security groups attached run for days and still they are very slow to resolve dns. I'll try to send the log bundle. Edit: I've sent the two bundles for two nodes to the email address provided in the guide. |
@uyilmaz - Did you notice this change after any upgrade or did it happen after any timeframe (Note VPC RC controller is managed by AWS side), so if this problem was not present previously, but started showing up now, do you know any timeline? |
Can you also share network policy which gets applied to pods with security group. |
I first updated the cluster from version Then I changed the node type to r6g.medium and enabled prefix mode. After that I began to experience the problem. If I delete the securitygrouppolicy of a pod, it starts resolving dns normally.
Here is the network policy of a pod that experiences the problem: apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
creationTimestamp: "2024-11-21T07:19:11Z"
finalizers:
- networking.k8s.aws/resources
generation: 5
name: network-policy-xxx
namespace: mynamespace
resourceVersion: "169997767"
uid: 2f3c82ee-6b96-478a-991e-1d396aeca33e
spec:
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.0.0/16
ingress:
- from:
- podSelector:
matchLabels:
mypod: xxx
- ipBlock:
cidr: 10.9.196.0/23
- ipBlock:
cidr: 10.9.198.0/23
- ipBlock:
cidr: 10.9.128.0/20
- ipBlock:
cidr: 10.9.0.0/20
podSelector:
matchLabels:
mypod: xxx
policyTypes:
- Ingress
- Egress The CIDRs in the ingress block are internet facing subnets of the eks cluster, plus a couple subnets that I wanted to allow access from inside the VPC. Like I said before, without the securitygrouppolicy, it works normally. |
I updated my second cluster to 1.31 and the same problem begin to occur there as well. ENABLE_PREFIX_DELEGATION is set to false. |
I think I understood why this happens. I have two worker nodes in my cluster and 2 coredns pods. Both coredns pods are running on the 1st node. The second node itself can access coredns pods using pod IPs, but pods (with security group attached) on the second node can not. When I added a new rule to worker node security group ( |
What happened:
I have this following setup, using "security groups for pods" and "prefix delegation":
Pods without a security group work normally, however, pods with a security group resolve DNS addresses so slowly that I first thought they couldn't resolve at all, but after numerous retries I get a few successful resolves. For example
curl example.com
times out with "could not resolve address" most of the time.Environment:
kubectl version
): v1.31.2-eks-7f9249acat /etc/os-release
): Amazon Linux 2uname -a
):Linux ip-x-x-xxx-xx.ap-northeast-1.compute.internal x.xx.xxx-xxx.xxx.amzn2.aarch64 #1 SMP Tue Oct 22 16:38:25 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
I have another cluster with the following setup that does not have the same problem (differences in bold):
Environment:
kubectl version
): v1.28.15-eks-7f9249acat /etc/os-release
): Amazon Linux 2The text was updated successfully, but these errors were encountered: