- Troubleshooting Windows
- Troubleshooting Security Group for Pods
- Troubleshooting Prefix Delegation for Windows
- Verify Windows prefix delegation is enabled in the ConfigMap
- Check both pod events and node events for any specific error
- Verify Node has the required Resource Capacity
- Verify Pod has the required resource limits
- Verify Pod has the required IPv4 Address Annotation
- Verify the configuration options set for windows prefix delegation
- Look for networking issues on the Windows Host
- List of Common Issues
Please follow the troubleshooting guide in the chronological order to debug issues with Windows Node and Pods.
To get the Platform Version of your EKS cluster
aws eks describe-cluster --name cluster-name --region us-west-2 | jq .cluster.platformVersion
Your Platform Version should be equal to or greater than Platfrom Version specified here.
Resolution
If your Platform Version is lower, you can
- Create a new EKS Cluster or
- Update to the new K8s Version if possible or
- Enable legacy controller support on your EKS Cluster using this guide.
To get the ConfigMap and the data field
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
You should have the ConfigMap with the following data,
enable-windows-ipam:true
Resolution
If the ConfigMap is missing or doesn't have the above field, you can
- Create or Update ConfigMap with the required fields by following this guide.
Describe the Windows Node,
kubectl describe node node-name
You should see a non-zero capacity for resource vpc.amazonaws.com/PrivateIPv4Address
Capacity:
vpc.amazonaws.com/PrivateIPv4Address: 9
Allocatable:
vpc.amazonaws.com/PrivateIPv4Address: 9
Resolution
If the node doesn't have the resource capacity validate the following,
- Windows Node has label
kubernetes.io/os: windows
orbeta.kubernetes.io/os: windows
. - There are Sufficient ENI/IP.
- Sufficient permissions in the Cluster Role.
Describe the Windows Pod,
kubectl describe pod windows-pod
You should see 1 limit and request for the resource vpc.amazonaws.com/PrivateIPv4Address
Limits:
vpc.amazonaws.com/PrivateIPv4Address: 1
Requests:
vpc.amazonaws.com/PrivateIPv4Address: 1
Resolution
If limit/request is missing,
- Validate Pod has nodeSelector.
nodeSelector: kubernetes.io/os: windows
- Validate Mutating Webhook Configuration is not accidentally deleted.
kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io vpc-resource-mutating-webhook NAME WEBHOOKS AGE vpc-resource-mutating-webhook 1 59d
Describe the Windows Pod,
kubectl describe pod windows-pod
The Pod should have the similar annotation.
Annotations: vpc.amazonaws.com/PrivateIPv4Address: 192.168.25.15/19
Resolution
If the Annotation is missing,
- Check the Pod Events for errors emitted by the vpc-resource-controller
- There are no PSP Blocking the annotation.
- There are Sufficient ENI/IP.
- Sufficient permissions in the Cluster Role.
Resolution
If the Pod is still stuck in ContainerCreating
you can,
- Fetch more detailed logs on the Host using the EKS Log collector script
- Check the CNI Logs from the collected logs.
- Open an Issue if no intuitive logs are present Issue in this repository.
Please follow the troubleshooting guide in the chronological order to debug issues with Security Group for Pods.
Describe the aws-node daemonset
kubectl get ds -n kube-system aws-node -o yaml
The following environment variable must be set.
containers:
name: aws-node
env:
- name: ENABLE_POD_ENI
value: "true"
Resolution If the environment variable is not set,
- Follow the guide to enable SGP feature.
Describe the Node,
kubectl describe node node-name
The following label will be set if Trunk ENI is created,
Labels: vpc.amazonaws.com/has-trunk-attached=true
Resolution
If the label is missing or set to false check for,
- Instance type supports ENI Trunking. Only Nitro instance supports this feature. See for supported instance types.
On nodes created before feature was enabled,
- Check if there's capacity to create one more ENI.
aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=instance-id
On nodes created after feature was enabled,
- There are Sufficient ENI/IP.
- Sufficient permissions in the Cluster Role.
Describe the SGP Pod
kubectl describe pod sgp-pod
You should see 1 limit and request for the resource vpc.amazonaws.com/pod-eni
Limits:
vpc.amazonaws.com/pod-eni: 1
Requests:
vpc.amazonaws.com/pod-eni: 1
Resolution
If limit/request is missing,
- Validate you have Security Group Policy that matches labels/service account with the Pod.
- Validate the RBAC Role and RoleBindings are not accidentally deleted.
kubectl get rolebindings.rbac.authorization.k8s.io -n kube-system eks-vpc-resource-controller-rolebinding kubectl get roles.rbac.authorization.k8s.io -n kube-system eks-vpc-resource-controller-role NAME ROLE AGE eks-vpc-resource-controller-rolebinding Role/eks-vpc-resource-controller-role 59d NAME CREATED AT eks-vpc-resource-controller-role 2021-11-08T07:40:41Z
- Validate Mutating Webhook Configuration is not accidentally deleted.
kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io vpc-resource-mutating-webhook NAME WEBHOOKS AGE vpc-resource-mutating-webhook 1 59d
Describe the SGP Pod,
kubectl describe pod sgp-pod
The Pod should have the following annotation.
Annotations: vpc.amazonaws.com/pod-eni: [Branch ENI Details]
Resolution
If the Annotation is missing,
- Check the Pod Events for errors emitted by the vpc-resource-controller
- There are no PSP Blocking the annotation.
- There are Sufficient ENI/IP.
- Sufficient permissions in the Cluster Role.
Resolution
If the Pod is still stuck in ContainerCreating
you can,
- Fetch more detailed logs on the Host using the EKS Log collector script
- Check the CNI Logs from the collected logs.
- Open an Issue in this repository if the problem still persists.
Please follow the troubleshooting steps here for issues with Windows Node and Pods when using prefix delegation
mode.
The following steps should be checked in chronological order to find out any issues with the workflow.
To get the ConfigMap and the data field
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
You should have the ConfigMap with the following data in the string,
enable-windows-ipam:true enable-windows-prefix-delegation:true
Resolution
If the ConfigMap is missing or doesn't have the above field, you can create or update the amazon-vpc-cni
ConfigMap with the required fields-
enable-windows-ipam: "true"
enable-windows-prefix-delegation: "true"
Note: Windows IPAM needs to be enabled in order to use windows prefix delegation feature.
In case the controller encounters any error during it's prefix delegation workflow which needs to be acted upon by the customer, it will emit the errors as pod events and/or node events. Therefore, checking the same can be a good starting point to root cause the issue.
You can obtain the pod events using the following command.
kubectl get events --all-namespaces
In case there is any explicit error, the same needs to be looked into.
For example, if the error states that there are insufficient space in the subnet to carve a /28 prefix, then the subnet needs to be looked into to ensure that /28 ranges are available which can be allocated as prefixes.
Same as Verify Node has the Resource Capacity
Same as Verify Pod has the resource limits
Same as Verify Pod has the IPv4 Address Annotation
Configuration options can be used to fine-tune the behaviour of prefix delegation on Windows. The details about the options are available here.
To get the ConfigMap and the data field
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
If you see any of the following keys in the data-
minimum-ip-target
warm-ip-target
warm-prefix-target
Then the configuration options have been set.
Resolution
Verify if the configuration is correct as mentioned in the documentation.
Alternatively, to isolate the issue, try removing the above keys from the config map.
Same as Look for Issues on the Windows Host
If you have a PSP that blocks annotation to Pod, you will have to allow annotation from the following User eks:vpc-resource-controller
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:authenticated
- kind: User
name: eks:vpc-resource-controller
apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
name: eks-vpc-resource-controller
To get cluster role for your EKS Cluster
aws eks describe-cluster --name cluster-name --region us-west-2 | j
q .cluster.roleArn
To find the policies attached to the cluster role
aws iam list-attached-role-policies --role-name role-name-from-above
The Policy arn:aws:iam::aws:policy/AmazonEKSVPCResourceController
must be present for the Windows/SGP feature to work. If it's missing, please add the policy.
New ENI Creation or Assigning Secondary IPv4 Address can fail if you don't have sufficient IPv4 Address in your Subnet.
To find the list of IPv4 address available
aws ec2 describe-subnets --subnet-id subnet-id-here
From the response you can look for how many IPv4 address are available in the Subnet from the field AvailableIpAddressCount
You should check if the feature is enabled via ConfigMap. To get the ConfigMap and the data field
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
If have the ConfigMap with the following data in the string,
enable-windows-prefix-delegation:true
then the feature is enabled.
Resolution
You can disable the feature by editing your config map and setting enable-windows-prefix-delegation
as "false"
.