You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use NVIDIA GPUs in our Kubernetes platform, we have installed nvidia-driver-535 on our Ubuntu 22.04 machine. We also installed nvidia-container-toolkit 15.0 for containerd. When we tried to build an image with nvidia/cuda 12.4, PyTorch was unable to find libcuda.so. To resolve this, we created a symbolic link using the following command:
I’d like to understand what exactly happened here and if there are any potential issues with this approach. Also, could you guide me on which part of the code to check? It seems that nvidia-container-toolkit copies some files from the host to the container—why does this happen?
The text was updated successfully, but these errors were encountered:
Hi team,
We use NVIDIA GPUs in our Kubernetes platform, we have installed
nvidia-driver-535
on ourUbuntu 22.04
machine. We also installednvidia-container-toolkit
15.0 forcontainerd
. When we tried to build an image withnvidia/cuda 12.4
, PyTorch was unable to findlibcuda.so
. To resolve this, we created a symbolic link using the following command:After doing this, the error disappeared.
I’d like to understand what exactly happened here and if there are any potential issues with this approach. Also, could you guide me on which part of the code to check? It seems that
nvidia-container-toolkit
copies some files from the host to the container—why does this happen?The text was updated successfully, but these errors were encountered: