-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.17.0, mode = "cdi"] OCI runtime create failed: 'failed to create link'/'failed to check if link exists: unexpected link target' #772
Comments
mode = "cdi"
] OCI runtime create failed: 'failed to create link'/'failed to check if link exists: unexpected link target'
@elezar It works fine with v1.16.2, so I rolled back to that version. |
@benz0li could you provide information on the image that you're running? I would assume that a symlink |
This happened to us as well. I think we should rollback this image to avoid more people from hitting this |
Image:
Yes. Due to the installation of |
The difference between the old (1.16.2) and new (1.17.0) CDI specification: diff --git a/etc/cdi/nvidia-1.16.2.yaml.bak b/etc/cdi/nvidia.yaml
index 8eb6b1e..2b6936b 100644
--- a/etc/cdi/nvidia-1.16.2.yaml.bak
+++ b/etc/cdi/nvidia.yaml
@@ -13,8 +13,6 @@ containerEdits:
- nvidia-cdi-hook
- create-symlinks
- --link
- - libnvidia-allocator.so.535.183.01::/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1
- - --link
- ../libnvidia-allocator.so.1::/usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so
- --link
- libnvidia-vulkan-producer.so.535.183.01::/usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so
@@ -22,11 +20,24 @@ containerEdits:
- libglxserver_nvidia.so.535.183.01::/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
+ - args:
+ - nvidia-cdi-hook
+ - create-symlinks
+ - --link
+ - libGLX_nvidia.so.535.183.01::/usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0
+ - --link
+ - libnvidia-opticalflow.so.1::/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so
+ - --link
+ - libcuda.so.1::/usr/lib/x86_64-linux-gnu/libcuda.so
+ hookName: createContainer
+ path: /usr/bin/nvidia-cdi-hook
- args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu
+ - --folder
+ - /usr/lib/x86_64-linux-gnu/vdpau
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
mounts:
@@ -332,6 +343,13 @@ containerEdits:
- nosuid
- nodev
- bind
+ - containerPath: /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.535.183.01
+ hostPath: /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.535.183.01
+ options:
+ - ro
+ - nosuid
+ - nodev
+ - bind
- containerPath: /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
hostPath: /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
options: |
@elezar Using v1.17.0, it works with docker run --rm -ti glcr.b-data.ch/jupyterlab/cuda/r/verse bash
It does not work with docker run --rm -ti glcr.b-data.ch/jupyterlab/cuda/r/verse bash
It does not work with podman run --rm --device nvidia.com/gpu=all -ti glcr.b-data.ch/jupyterlab/cuda/r/verse bash
Footnotes
|
@elezar Isn't I.e. using Mesa instead. Is my understanding correct? |
Workaround for v1.17.0: Delete lines
in |
|
In the v1.17.0 update we changed the behaviour of our |
@elezar No worries. Very solid work on the NVIDIA Container Toolkit. Almost never had a problem with it. Thank you! |
Full error message:
Further information:
The text was updated successfully, but these errors were encountered: