[1.18] RHEL 8.10 + Kubernetes 1.29.0 + CRI-O 1.32: Container creation error: writing file devices.allow: Operation not permitted #1599

Abdullahxz · 2024-11-01T08:29:12Z

We have been experiencing this after patching OS. Any chance this is related to #1589?

Nov 01 11:09:12 [Redacted server name] kubelet[3963]: E1101 11:09:12.299138    3963 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CreateContainerError: \"container create failed: writing file `devices.allow`: Operation not permitted\\n\"" pod="kube-system/etcd-[Redacted server name]" podUID="7c82959225af26de371bcf09048168e5"
Nov 01 11:09:14 [Redacted server name] kubelet[3963]: E1101 11:09:14.591782    3963 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://10.179.6.29:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/[Redacted server name]?timeout=10s\": context deadline exceeded - error from a previous attempt: EOF" interval="7s"
Nov 01 11:09:16 [Redacted server name] kubelet[3963]: E1101 11:09:16.176304    3963 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"[Redacted server name]\" not found"
Nov 01 11:09:16 [Redacted server name] kubelet[3963]: E1101 11:09:16.293893    3963 remote_runtime.go:319] "CreateContainer in sandbox from runtime service failed" err=<
Nov 01 11:09:16 [Redacted server name] kubelet[3963]:         rpc error: code = Unknown desc = container create failed: writing file `devices.allow`: Operation not permitted
Nov 01 11:09:16 [Redacted server name] kubelet[3963]:  > podSandboxID="c99bca33d15b0a59499f314b2625367561b0529eff0905fec23934bcfc6ae58d"

cgroup related information:

[[Redacted user]@[Redacted server name] ~]$ uname -a
Linux [Redacted server name] 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 11 18:02:00 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

[[Redacted user]@[Redacted server name] ~]$  cat /etc/*os-release*
NAME="Red Hat Enterprise Linux"
VERSION="8.10 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://issues.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.10
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.10"

[[Redacted user]@[Redacted server name] ~]$ grep cgroup /proc/filesystems
nodev	cgroup
nodev	cgroup2

[[Redacted user]@[Redacted server name] ~]$ stat -fc %T /sys/fs/cgroup/
tmpfs

[[Redacted user]@[Redacted server name] ~]$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

The text was updated successfully, but these errors were encountered:

Abdullahxz · 2024-11-01T08:30:09Z

@giuseppe @hswong3i Perhaps any help you may be able to provide? 🙏

hswong3i · 2024-11-01T08:45:28Z

If you just need a RPM for RHEL8: https://build.opensuse.org/projects/home:alvistack/packages/containers-crun-1.18.2/repositories/AlmaLinux_8/binaries

BTW, Kubernetes 1.29.x should combine with CRI-O 1.29.x, shouldn't be CRI-O 1.32.x...

Abdullahxz · 2024-11-01T10:38:31Z

@hswong3i Good catch, it seems cri-o got updated with the recent OS patches. I have downgraded it back to v1.29.9 and the cluster is healthy now. Thank you!

jthiltges · 2024-11-07T20:47:10Z

We also ran into this issue after updating to cri-o-1.29.10-150500.1.1.x86_64 on EL8. Possibly a regression from the cgroup changes between crun 1.17 and 1.18.2?

Enabling cgroups v2 seemed to be a straightforward workaround.

strace of the error:

[pid 2555150] execve("/usr/libexec/crio/crun", ["/usr/libexec/crio/crun", "--systemd-cgroup", "--root=/run/crun", "create", "--bundle", "/run/containers/storage/overlay-containers/e852010d7494af595fc8dee29e506ff33f17ba73401419f9be6e29b955d8353a/userdata", "--pid-file", "/run/containers/storage/overlay-containers/e852010d7494af595fc8dee29e506ff33f17ba73401419f9be6e29b955d8353a/userdata/pidfile", "e852010d7494af595fc8dee29e506ff33f17ba73401419f9be6e29b955d8353a"], 0x7ffcc5b4da88 /* 2 vars */ <unfinished ...>
...
[pid 2555150] openat(AT_FDCWD, "/sys/fs/cgroup/devices/system.slice/crun-e852010d7494af595fc8dee29e506ff33f17ba73401419f9be6e29b955d8353a.scope/container", O_RDONLY|O_CLOEXEC|O_DIRECTORY) = 4
[pid 2555150] openat(4, "devices.deny", O_WRONLY|O_CLOEXEC) = 6
[pid 2555150] write(6, "a", 1)          = 1
[pid 2555150] close(6)                  = 0
[pid 2555150] openat(4, "devices.allow", O_WRONLY|O_CLOEXEC) = 6
[pid 2555150] write(6, "c *:* m", 7)    = -1 EPERM (Operation not permitted)

hswong3i · 2024-11-08T02:15:16Z

We also ran into this issue after updating to cri-o-1.29.10-150500.1.1.x86_64 on EL8. Possibly a regression from the cgroup changes between crun 1.17 and 1.18.2?

At least #1589 reported Ubuntu 20.04 + cgroup v1 + crun 1.8.0 (bug) / 1.8.2 (fixed).

If case still happening with EL8, I guess kernel version does matter? @giuseppe any idea?

giuseppe · 2024-11-08T16:56:04Z

that might be something different.

@jthiltges can you confirm it was crun 1.18.2 ?

jthiltges · 2024-11-08T18:13:02Z

Testing above was with EL8.10 (Alma), and the OBS build of cri-o:

[root@red-kube-vm001 ~]# rpm -q cri-o
cri-o-1.29.10-150500.1.1.x86_64
[root@red-kube-vm001 ~]# /usr/libexec/crio/crun --version
crun version 1.18.2
[root@red-kube-vm001 ~]# uname -a
Linux red-kube-vm001.unl.edu 4.18.0-553.27.1.el8_10.x86_64 #1 SMP Tue Nov 5 04:50:16 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

Downgrading to cri-o-1.29.9 gave crun 1.17. The issue did not appear, matching up with Abdullahxz's comment.

[root@flatiron-vm001 ~]# rpm -q cri-o
cri-o-1.29.9-150500.1.1.x86_64
[root@flatiron-vm001 ~]# /usr/libexec/crio/crun --version
crun version 1.17

discostur · 2024-11-14T17:34:40Z

@giuseppe I can confirm the issue which @jthiltges posted (Almalinux 8.10):

broken:

$ rpm -q cri-o
cri-o-1.29.10-150500.1.1.x86_64

$ /usr/libexec/crio/crun --version
crun version 1.18.2
commit: 00ab38af875ddd0d1a8226addda52e1de18339b5
rundir: /run/user/0/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL

$ uname -a
Linux k8s-master01.intern.customer-virt.eu 4.18.0-553.27.1.el8_10.x86_64 #1 SMP Tue Nov 5 04:50:16 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

working:

$ rpm -q cri-o
cri-o-1.29.9-150500.1.1.x86_64

$ /usr/libexec/crio/crun --version
crun version 1.17
commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
rundir: /run/user/0/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL

Error:

crio[783]: time="2024-11-14 18:29:39.290954314+01:00" level=error msg="Container creation error: writing file `devices.allow`: Operation not permitted\n" id=87240ab3-4116-49a5-901e-3e2024880466 name=/runtime.v1.RuntimeService/CreateContainer

giuseppe · 2024-11-14T17:54:14Z

@discostur thanks, it seems to affect cgroup v1.

Could you share the pod spec so I'll try to reproduce locally?

discostur · 2024-11-14T17:59:43Z

@giuseppe it was not specific to a pod - the error was related to all pods running on that node. So crio / kubelet was not able to start any pod.

For example kube-proxy:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-11-13T18:29:44Z"
  generateName: kube-proxy-
  labels:
    controller-revision-hash: 6bcd97f568
    k8s-app: kube-proxy
    pod-template-generation: "27"
  name: kube-proxy-7t5ns
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: kube-proxy
    uid: 10b1a9a0-5de5-11e8-b0e4-c2e62762236c
  resourceVersion: "1306273433"
  uid: cdfc80b3-9b55-42bd-9d50-3d02dae073a2
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - k8s-node01.intern
  containers:
  - command:
    - /usr/local/bin/kube-proxy
    - --config=/var/lib/kube-proxy/config.conf
    - --hostname-override=$(NODE_NAME)
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    image: registry.k8s.io/kube-proxy:v1.29.10
    imagePullPolicy: IfNotPresent
    name: kube-proxy
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kube-proxy
      name: kube-proxy
    - mountPath: /run/xtables.lock
      name: xtables-lock
    - mountPath: /lib/modules
      name: lib-modules
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-r7lt7
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  nodeName: k8s-node01.intern
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: kube-proxy
  serviceAccountName: kube-proxy
  terminationGracePeriodSeconds: 30
  tolerations:
  - operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - configMap:
      defaultMode: 420
      name: kube-proxy
    name: kube-proxy
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: xtables-lock
  - hostPath:
      path: /lib/modules
      type: ""
    name: lib-modules
  - name: kube-api-access-r7lt7
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

giuseppe · 2024-11-15T11:22:00Z

I wonder if it is a systemd version that is too old

discostur · 2024-11-15T12:38:02Z

@giuseppe

$ rpm -q systemd
systemd-239-82.el8_10.2.x86_64

discostur · 2024-12-04T18:17:56Z

Seems to be fixed if you switch / enable cgroup v2:

cri-o/cri-o#8743

https://access.redhat.com/solutions/6898151

Abdullahxz closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.18] RHEL 8.10 + Kubernetes 1.29.0 + CRI-O 1.32: Container creation error: writing file devices.allow: Operation not permitted #1599

[1.18] RHEL 8.10 + Kubernetes 1.29.0 + CRI-O 1.32: Container creation error: writing file devices.allow: Operation not permitted #1599

Abdullahxz commented Nov 1, 2024 •

edited

Loading

Abdullahxz commented Nov 1, 2024

hswong3i commented Nov 1, 2024

Abdullahxz commented Nov 1, 2024

jthiltges commented Nov 7, 2024

hswong3i commented Nov 8, 2024

giuseppe commented Nov 8, 2024

jthiltges commented Nov 8, 2024

discostur commented Nov 14, 2024 •

edited

Loading

giuseppe commented Nov 14, 2024

discostur commented Nov 14, 2024

giuseppe commented Nov 15, 2024

discostur commented Nov 15, 2024

discostur commented Dec 4, 2024

[1.18] RHEL 8.10 + Kubernetes 1.29.0 + CRI-O 1.32: Container creation error: writing file devices.allow: Operation not permitted #1599

[1.18] RHEL 8.10 + Kubernetes 1.29.0 + CRI-O 1.32: Container creation error: writing file devices.allow: Operation not permitted #1599

Comments

Abdullahxz commented Nov 1, 2024 • edited Loading

Abdullahxz commented Nov 1, 2024

hswong3i commented Nov 1, 2024

Abdullahxz commented Nov 1, 2024

jthiltges commented Nov 7, 2024

hswong3i commented Nov 8, 2024

giuseppe commented Nov 8, 2024

jthiltges commented Nov 8, 2024

discostur commented Nov 14, 2024 • edited Loading

giuseppe commented Nov 14, 2024

discostur commented Nov 14, 2024

giuseppe commented Nov 15, 2024

discostur commented Nov 15, 2024

discostur commented Dec 4, 2024

Abdullahxz commented Nov 1, 2024 •

edited

Loading

discostur commented Nov 14, 2024 •

edited

Loading