Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about compatibility #675

Open
suchen-sci opened this issue Sep 4, 2024 · 0 comments
Open

Question about compatibility #675

suchen-sci opened this issue Sep 4, 2024 · 0 comments

Comments

@suchen-sci
Copy link

Hi team,

We use NVIDIA GPUs in our Kubernetes platform, we have installed nvidia-driver-535 on our Ubuntu 22.04 machine. We also installed nvidia-container-toolkit 15.0 for containerd. When we tried to build an image with nvidia/cuda 12.4, PyTorch was unable to find libcuda.so. To resolve this, we created a symbolic link using the following command:

ln -sf /usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01 /usr/lib/x86_64-linux-gnu/libcuda.so.1

After doing this, the error disappeared.

I’d like to understand what exactly happened here and if there are any potential issues with this approach. Also, could you guide me on which part of the code to check? It seems that nvidia-container-toolkit copies some files from the host to the container—why does this happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant