You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the node is rebooted, hami may not be able to schedule the pods because of the reported status. After I restart the scheduler and device plugin manually, I need to remove the pods in Pending before scheduling them, or else the pods will be in Pending state all the time.
What you expected to happen:
Automatically try to schedule Pods in Pending state after hami is normal, instead of letting them stay Pending.
How to reproduce it (as minimally and precisely as possible):
See upon.
Anything else we need to know?:
The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg
Environment:
HAMi version: v2.4.0
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others:
The text was updated successfully, but these errors were encountered:
What happened:
When the node is rebooted, hami may not be able to schedule the pods because of the reported status. After I restart the scheduler and device plugin manually, I need to remove the pods in Pending before scheduling them, or else the pods will be in Pending state all the time.
What you expected to happen:
Automatically try to schedule Pods in Pending state after hami is normal, instead of letting them stay Pending.
How to reproduce it (as minimally and precisely as possible):
See upon.
Anything else we need to know?:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
uname -a
The text was updated successfully, but these errors were encountered: