You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Windows 11 Laptop, Docker Desktop with nvidia-container-toolkit and WSL2 back-end.
Remote server, running Ubuntu 20.04.6 LTS, docker with nvidia-container-toolkit installed.
Both systems runs containers with --gpus all flag fine and nvidia-smi outputs correctly.
Problem:
Used docker save -o saved_image.tar image_name:latest and uploaded the tar to remote server.
Imported the image with docker load -i saved_image.tar on remote server.
Run the image with docker run --gpus all -it image_name:latest.
nvidia-smi throws error:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.
however, nvcc -V works fine.
Investigation taken:
Found posts from years ago about similar issues of nvcc working but nvidia-smi not, however, tried the following mitigations and no luck:
adding /usr/lib/x86... to PATH
ldconfig and then nvidia-smi
ldconfig produces the following:
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libdxcore.so is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so is empty, not checked.
/sbin/ldconfig.real: /usr/lib/x86_64-linux-gnu/libcuda.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is not a symbolic link
On the laptop where this image is packed, ldconfig returns only the following:
/sbin/ldconfig.real: /usr/lib/x86_64-linux-gnu/libcuda.so.1 is not a symbolic link
I blindly tried to copy paste the "empty" files above from the original image into the new image, turned out these are empty files too.
I also tried loading the tar right on the laptop and it also works fine. So I suspect this might be a problem with WSL2 backend exports?
The text was updated successfully, but these errors were encountered:
Machines:
Both systems runs containers with
--gpus all
flag fine andnvidia-smi
outputs correctly.Problem:
docker save -o saved_image.tar image_name:latest
and uploaded the tar to remote server.docker load -i saved_image.tar
on remote server.docker run --gpus all -it image_name:latest
.nvidia-smi
throws error:NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.
nvcc -V
works fine.Investigation taken:
ldconfig
and thennvidia-smi
ldconfig
produces the following:On the laptop where this image is packed, ldconfig returns only the following:
I blindly tried to copy paste the "empty" files above from the original image into the new image, turned out these are empty files too.
I also tried loading the tar right on the laptop and it also works fine. So I suspect this might be a problem with WSL2 backend exports?
The text was updated successfully, but these errors were encountered: