Skip to content

health-checker posts wrong status for kubelet - KubeletUnhealthy #1078

@younsl

Description

@younsl

Summary

There's a significant problem with Node Problem Detector v0.8.20 on EKS where the node shows conflicting status conditions.

Environment

  • Node Problem Detector v0.8.20
  • EKS Optimized AL2 AMI v20250519

Details

The following monitors are currently enabled in the NPD helm chart:

# node-problem-detector/values_my.yaml
settings:
  log_monitors:
    - /config/kernel-monitor.json
    - /config/docker-monitor.json
    - /config/readonly-monitor.json
    # An example of activating a custom log monitor definition in
    # Node Problem Detector
    # - /custom-config/docker-monitor-filelog.json
  custom_plugin_monitors:
    - /config/health-checker-kubelet.json

kubectl get node:

NAME                                               STATUS   ROLES    AGE   VERSION
ip-10-xxx-xx-xxx.ap-northeast-2.compute.internal   Ready    <none>   16m   v1.32.3-eks-473151a

Node conditions:

Conditions:
  Type                    Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                    ------  -----------------                 ------------------                ------                       -------
  ReadonlyFilesystem      False   Wed, 16 Jul 2025 15:57:17 +0900   Wed, 16 Jul 2025 15:52:15 +0900   FilesystemIsNotReadOnly      Filesystem is not read-only
  KubeletUnhealthy        True    Wed, 16 Jul 2025 16:04:16 +0900   Wed, 16 Jul 2025 15:59:15 +0900   KubeletUnhealthy             kubelet:kubelet was found unhealthy; repair flag : true
  CorruptDockerOverlay2   False   Wed, 16 Jul 2025 16:04:16 +0900   Wed, 16 Jul 2025 15:59:15 +0900   NoCorruptDockerOverlay2      docker overlay2 is functioning properly
  KernelDeadlock          False   Wed, 16 Jul 2025 15:57:17 +0900   Wed, 16 Jul 2025 15:52:15 +0900   KernelHasNoDeadlock          kernel has no deadlock
  MemoryPressure          False   Wed, 16 Jul 2025 16:02:49 +0900   Wed, 16 Jul 2025 15:51:36 +0900   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure            False   Wed, 16 Jul 2025 16:02:49 +0900   Wed, 16 Jul 2025 15:51:36 +0900   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure             False   Wed, 16 Jul 2025 16:02:49 +0900   Wed, 16 Jul 2025 15:51:36 +0900   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                   True    Wed, 16 Jul 2025 16:02:49 +0900   Wed, 16 Jul 2025 15:51:54 +0900   KubeletReady                 kubelet is posting ready status

Directly accessing to node via node-shell:

$ which systemctl
/usr/bin/systemctl
$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubelet-args.conf, 30-kubelet-extra-args.conf
   Active: active (running) since Wed 2025-07-16 07:53:47 UTC; 54min ago
     Docs: https://github.com/kubernetes/kubernetes
 Main PID: 4013 (kubelet)
   CGroup: /runtime.slice/kubelet.service
           └─4013 /usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoin...

Jul 16 08:47:43 ip-xx-xxx-xx-xxx.ap-northeast-2.compute.internal kubelet[4013]: I0716 08:47:43.657399    4013 util.go:30] "No sandbox for pod can be fo...gwao"

After checking directly on the node, I found that kubelet shows up normally in systemctl, the systemctl command exists, and the command path is returned correctly.

How can I solve this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions