Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FailedCreatePodSandBox from kubelet with error: plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: #555

Open
jaecrane opened this issue May 3, 2024 · 3 comments

Comments

@jaecrane
Copy link

jaecrane commented May 3, 2024

What happened?

Hello there! Currently, I am working on SR-IOV related work based on one master node and one work node.

For the worker node, VF is created and the master node can confirm that. But, when I take deployment, the pod is stuck in the "ContainerCreating" state. When I saw a log of that pod using "kubectl describe", log shows below,
"type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400"

I performed a step to take deployment vf needed POD as below,
step1. create sriov-cni-daemonset.yaml (from sriov-cni)
step2, create configMap.yaml (from sriov-network-device-plugin)
step3, create sriovdp-daemonset.yaml (from sriov-network-device-plugin)
step4, create sriov-crd.yaml (NetworkAttachmentDefinition)
step5, create multus-daemonset-thick-thick.yml (from multus-cni)
step6, create vf needed pod(deployment)

Please give me an advice pls,

thank you!

What did you expect to happen?

I assumed that version confusion between multus, calico, kubenetes

What are the minimal steps needed to reproduce the bug?

Just perform a simple tutorial which is shown in the "quickstart guide" of sriov-network-device-plugin

Anything else we need to know?

pod with vf allocated is scheduled correctly, but kubelet can't create that pod

Component Versions

Component Version
SR-IOV Network Device Plugin v3.6.2 *latest version, in master node
SR-IOV CNI Plugin v2.7.0 *latest version, in master node
Multus v4.0.2 *latest version, in master node
Kubernetes v1.29.4 *latest version, in worker and master node
containerd v1.6.31 in worker and master node
golang v1.21.9 in worker and master node
OS ubuntu 22.04 with 6.5.0-28-generic worker and master node
10GbE intel ixgbe driver(v5.19.9) with x520A2 nic, in worker-node

Config Files, configMap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "intel_sriov_netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["154c", "1515"],
"drivers": ["i40evf", "ixgbevf"],
"pfNames": ["enp3s0f0#0-3","enp3s0f1"]
}
},
{
"resourceName": "intel_sriov_dpdk",
"selectors": {
"vendors": ["8086"],
"devices": ["154c", "10ed", "1889", "1515"],
"drivers": ["vfio-pci"],
"pfNames": ["enp3s0f0","enp3s0f1"]
}
}
]
}

Device pool config file location (Try '/etc/pcidp/config.json')

{
"resourceList": [{
"resourceName": "intel_sriov_netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["1515"],
"drivers": ["ixgbevf"],
"pfNames": ["enp3s0f0#0-3","enp3s0f1"]
}
}
]
}

Multus config, 00-multus.conf

{"capabilities":{"bandwidth":true,"portMappings":true},"cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","type":"multus-shim"}

CNI config (Try '/etc/cni/net.d/'), 10-calico.conflist

{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [{"container_settings":{"allow_ip_forwarding":false},"datastore_type":"kubernetes","ipam":{"assign_ipv4":"true","assign_ipv6":"false","type":"calico-ipam"},"kubernetes":{"k8s_api_root":"https://10.96.0.1:443","kubeconfig":"/etc/cni/net.d/calico-kubeconfig"},"log_file_max_age":30,"log_file_max_count":10,"log_file_max_size":100,"log_file_path":"/var/log/calico/cni/cni.log","log_level":"Info","mtu":0,"nodename_file_optional":false,"policy":{"type":"k8s"},"type":"calico"},{"capabilities":{"bandwidth":true},"type":"bandwidth"},{"capabilities":{"portMappings":true},"snat":true,"type":"portmap"}]

Kubernetes deployment type ( Bare Metal, Kubeadm etc.)

Kubeadm with no Hypervisor

SR-IOV Network Custom Resource Definition

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net1
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
spec:
config: '{
"type": "sriov",
"cniVersion": "0.3.1",
"name": "sriov-network",
"ipam": {
"type": "host-local",
"subnet": "192.168.30.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "192.168.30.254"
}
}'

POD configuration file

apiVersion: v1
kind: Pod
metadata:
name: testpod1
annotations:
k8s.v1.cni.cncf.io/networks: sriov-net1
spec:
containers:

  • name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
    requests:
    intel.com/intel_sriov_netdevice: '1'
    limits:
    intel.com/intel_sriov_netdevice: '1'

Logs

Kubelet logs (journalctl -u kubelet)

5월 03 17:40:32 dnccom kubelet[5493]: rpc error: code = Unknown desc = failed to setup network for sandbox "c9c9bb184458ed201e45e9be1134ce650615d174cb889f957576420bd0797286": plugin type="multus-shim" name="multus-cni-network" failed >
5월 03 17:40:32 dnccom kubelet[5493]: E0503 17:40:32.221354 5493 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err=<
5월 03 17:40:32 dnccom kubelet[5493]: >
5월 03 17:40:32 dnccom kubelet[5493]: ': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":">
5월 03 17:40:32 dnccom kubelet[5493]:
5월 03 17:40:32 dnccom kubelet[5493]: rpc error: code = Unknown desc = failed to setup network for sandbox "c9c9bb184458ed201e45e9be1134ce650615d174cb889f957576420bd0797286": plugin type="multus-shim" name="multus-cni-network" failed >
5월 03 17:40:32 dnccom kubelet[5493]: E0503 17:40:32.221314 5493 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err=<
5월 03 17:40:16 dnccom kubelet[5493]: E0503 17:40:16.702886 5493 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "testpod1_default(a9081041-e9d7-406b-91e7-33d5dea78a84)" with CreatePodSandboxErr>
5월 03 17:40:16 dnccom kubelet[5493]: > pod="default/testpod1"
5월 03 17:40:16 dnccom kubelet[5493]: ': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":">
5월 03 17:40:16 dnccom kubelet[5493]:
5월 03 17:40:16 dnccom kubelet[5493]: rpc error: code = Unknown desc = failed to setup network for sandbox "9561c24af74830052382641f69defcde575b7a2df5bf2cc2873adcf8d9808771": plugin type="multus-shim" name="multus-cni-network" failed >
5월 03 17:40:16 dnccom kubelet[5493]: E0503 17:40:16.702822 5493 kuberuntime_manager.go:1172] "CreatePodSandbox for pod failed" err=<
5월 03 17:40:16 dnccom kubelet[5493]: > pod="default/testpod1"
5월 03 17:40:16 dnccom kubelet[5493]: ': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":">
5월 03 17:40:16 dnccom kubelet[5493]:
5월 03 17:40:16 dnccom kubelet[5493]: rpc error: code = Unknown desc = failed to setup network for sandbox "9561c24af74830052382641f69defcde575b7a2df5bf2cc2873adcf8d9808771": plugin type="multus-shim" name="multus-cni-network" failed >
5월 03 17:40:16 dnccom kubelet[5493]: E0503 17:40:16.702803 5493 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err=<
5월 03 17:40:16 dnccom kubelet[5493]: >
5월 03 17:40:16 dnccom kubelet[5493]: ': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":">
5월 03 17:40:16 dnccom kubelet[5493]:
5월 03 17:40:16 dnccom kubelet[5493]: rpc error: code = Unknown desc = failed to setup network for sandbox "9561c24af74830052382641f69defcde575b7a2df5bf2cc2873adcf8d9808771": plugin type="multus-shim" name="multus-cni-network" failed >
5월 03 17:40:16 dnccom kubelet[5493]: E0503 17:40:16.702758 5493 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err=<
5월 03 17:40:03 dnccom kubelet[5493]: E0503 17:40:03.805170 5493 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "testpod1_default(a9081041-e9d7-406b-91e7-33d5dea78a84)" with CreatePodSandboxErr>
5월 03 17:40:03 dnccom kubelet[5493]: > pod="default/testpod1"
5월 03 17:40:03 dnccom kubelet[5493]: ': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":">

CLI: "kubectl describe pod "

Name: testpod1
Namespace: default
Priority: 0
Service Account: default
Node: dnccom/###.###.30.79
Start Time: Fri, 03 May 2024 18:32:56 +0900
Labels:
Annotations: cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/networks: sriov-net1
Status: Pending
IP:
IPs:
Containers:
appcntr1:
Container ID:
Image: centos/tools
Image ID:
Port:
Host Port:
Command:
/bin/bash
-c
--
Args:
while true; do sleep 300000; done;
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
intel.com/intel_sriov_netdevice: 1
Requests:
intel.com/intel_sriov_netdevice: 1
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fj9zv (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-fj9zv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 3m32s default-scheduler Successfully assigned default/testpod1 to dnccom
Warning FailedCreatePodSandBox 3m31s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2" Netns:"/var/run/netns/cni-0d8488d1-0eaa-c857-facb-bf26bb02401c" IfName:"eth0" Args:"K8S_POD_NAMESPACE=default;K8S_POD_NAME=testpod1;K8S_POD_INFRA_CONTAINER_ID=62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2;K8S_POD_UID=c2943d63-8c45-4e7f-82b8-e4f2d22844e8;IgnoreUnknown=1" Path:"" ERRORED: error configuring pod [default/testpod1] networking: Multus: [default/testpod1/c2943d63-8c45-4e7f-82b8-e4f2d22844e8]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: failed to get a ResourceClient instance: getKubeletClient: error getting pod resources from client: getPodResources: failed to list pod resources, &{0xc000380800}.Get(_) = _, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"

': StdinData: {"capabilities":{"bandwidth":true,"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-calico.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}
Warning FailedCreatePodSandBox 3m29s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "89ebb6d44babc1612bca579adff6d668bce7f7520c4d88121b8a21174205adb2": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"89ebb6d44babc1612bca579adff6d668bce7f7520c4d88121b8a21174205adb2" Netns:"/var/run/netns/cni-7e36d7aa-99df-7531-e4c0-43a811feb4b3" IfName:"eth0" Args:"K8S_POD_NAME=testpod1;K8S_POD_INFRA_CONTAINER_ID=89ebb6d44babc1612bca579adff6d668bce7f7520c4d88121b8a21174205adb2;K8S_POD_UID=c2943d63-8c45-4e7f-82b8-e4f2d22844e8;IgnoreUnknown=1;K8S_POD_NAMESPACE=default" Path:"" ERRORED: error configuring pod [default/testpod1] networking: Multus: [default/testpod1/c2943d63-8c45-4e7f-82b8-e4f2d22844e8]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: failed to get a ResourceClient instance: getKubeletClient: error getting pod resources from client: getPodResources: failed to list pod resources, &{0xc0008b0400}.Get(_) = _, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"

CLI: "kubectl describe node "

Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=dnccom
kubernetes.io/os=linux
Annotations: csi.volume.kubernetes.io/nodeid: {"csi.tigera.io":"dnccom"}
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: ###.###.30.79/24
projectcalico.org/IPv4VXLANTunnelAddr: 192.168.150.128
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 02 May 2024 23:24:20 +0900
Taints:
Unschedulable: false
Lease:
HolderIdentity: dnccom
AcquireTime:
RenewTime: Fri, 03 May 2024 18:34:48 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


NetworkUnavailable False Thu, 02 May 2024 23:49:43 +0900 Thu, 02 May 2024 23:49:43 +0900 CalicoIsUp Calico is running on this node
MemoryPressure False Fri, 03 May 2024 18:32:09 +0900 Fri, 03 May 2024 13:05:29 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 May 2024 18:32:09 +0900 Fri, 03 May 2024 13:05:29 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 03 May 2024 18:32:09 +0900 Fri, 03 May 2024 13:05:29 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 03 May 2024 18:32:09 +0900 Fri, 03 May 2024 13:05:29 +0900 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: ###.###.##.79
Hostname: dnccom
Capacity:
cpu: 20
ephemeral-storage: 479079112Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
intel.com/intel_sriov_netdevice: 4
intel.com/sriov: 0
memory: 49192176Ki
pods: 110
Allocatable:
cpu: 20
ephemeral-storage: 441519308889
hugepages-1Gi: 0
hugepages-2Mi: 0
intel.com/intel_sriov_netdevice: 4
intel.com/sriov: 0
memory: 49089776Ki
pods: 110
System Info:
Machine ID: 6dbf3df58ecb4321aa69d80d0db63840
System UUID: cb2dedaa-277f-17f0-9572-3c7c3f526e61
Boot ID: 0b5ae480-0e83-4749-a260-1bd7d836c4f8
Kernel Version: 6.5.0-28-generic
OS Image: Ubuntu 22.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.31
Kubelet Version: v1.29.4
Kube-Proxy Version: v1.29.4
PodCIDR: 192.168.5.0/24
PodCIDRs: 192.168.5.0/24
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age


calico-system calico-node-gqztj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
calico-system csi-node-driver-kmm7g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
default mysql-64f9d5b444-nq6qt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default testpod1 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2m
kube-system kube-multus-ds-vmmn5 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 6h56m
kube-system kube-proxy-72dhk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-sriov-cni-ds-amd64-h6529 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 19h
kube-system kube-sriov-device-plugin-lwwwn 250m (1%) 1 (5%) 40Mi (0%) 200Mi (0%) 6h42m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 450m (2%) 1200m (6%)
memory 140Mi (0%) 300Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
intel.com/intel_sriov_netdevice 1 1
intel.com/sriov 0 0
Events:

@SchSeba
Copy link
Collaborator

SchSeba commented May 3, 2024

Hi @jaecrane,

base on this error

Normal Scheduled 3m32s default-scheduler Successfully assigned default/testpod1 to dnccom
Warning FailedCreatePodSandBox 3m31s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2" Netns:"/var/run/netns/cni-0d8488d1-0eaa-c857-facb-bf26bb02401c" IfName:"eth0" Args:"K8S_POD_NAMESPACE=default;K8S_POD_NAME=testpod1;K8S_POD_INFRA_CONTAINER_ID=62bc023b4a52365a7c92287d7a89b8e2a1a43acdbd8b428da5ee85d78c26d6e2;K8S_POD_UID=c2943d63-8c45-4e7f-82b8-e4f2d22844e8;IgnoreUnknown=1" Path:"" ERRORED: error configuring pod [default/testpod1] networking: Multus: [default/testpod1/c2943d63-8c45-4e7f-82b8-e4f2d22844e8]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: failed to get a ResourceClient instance: getKubeletClient: error getting pod resources from client: getPodResources: failed to list pod resources, &{0xc000380800}.Get(_) = _, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"

I don't see how the problem is the device plugin

@NieShiyang
Copy link

I have met the same issue, did you solve it, please?

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 19, 2024

Hi @jaecrane @NieShiyang any update on this one or we can close the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants