Question about compatibility #675

suchen-sci · 2024-09-04T10:28:50Z

Hi team,

We use NVIDIA GPUs in our Kubernetes platform, we have installed nvidia-driver-535 on our Ubuntu 22.04 machine. We also installed nvidia-container-toolkit 15.0 for containerd. When we tried to build an image with nvidia/cuda 12.4, PyTorch was unable to find libcuda.so. To resolve this, we created a symbolic link using the following command:

ln -sf /usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01 /usr/lib/x86_64-linux-gnu/libcuda.so.1

After doing this, the error disappeared.

I’d like to understand what exactly happened here and if there are any potential issues with this approach. Also, could you guide me on which part of the code to check? It seems that nvidia-container-toolkit copies some files from the host to the container—why does this happen?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about compatibility #675

Question about compatibility #675

suchen-sci commented Sep 4, 2024

Question about compatibility #675

Question about compatibility #675

Comments

suchen-sci commented Sep 4, 2024