hami will not schedule Pods that are in Pending #812

kebe7jun · 2025-01-15T10:15:27Z

What happened:

When the node is rebooted, hami may not be able to schedule the pods because of the reported status. After I restart the scheduler and device plugin manually, I need to remove the pods in Pending before scheduling them, or else the pods will be in Pending state all the time.

What you expected to happen:

Automatically try to schedule Pods in Pending state after hami is normal, instead of letting them stay Pending.

How to reproduce it (as minimally and precisely as possible):

See upon.

Anything else we need to know?:

The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg

Environment:

HAMi version: v2.4.0
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others:

The text was updated successfully, but these errors were encountered:

lengrongfu · 2025-01-17T03:34:09Z

let me see

kebe7jun added the kind/bug Something isn't working label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hami will not schedule Pods that are in Pending #812

hami will not schedule Pods that are in Pending #812

kebe7jun commented Jan 15, 2025

lengrongfu commented Jan 17, 2025

hami will not schedule Pods that are in Pending #812

hami will not schedule Pods that are in Pending #812

Comments

kebe7jun commented Jan 15, 2025

lengrongfu commented Jan 17, 2025