Can't convert compose service with CDI device #107

rany2 · 2024-09-15T06:41:56Z

Consider the following service:

  jellyfin:
    image: docker.io/jellyfin/jellyfin:latest
    container_name: jellyfin
    restart: unless-stopped
    #user: 973:973  # media:media
    group_add:
      - video
    ports:
      - 127.0.0.1:8096:8096
    volumes:
      - ./jellyfin/config:/config
      - ./jellyfin/cache:/cache
      - /mnt/hdd/media:/data/media
    devices:
      - nvidia.com/gpu=all
    security_opt:
      - label=disable

Ignore the fact that the user entry would fail with podlet due to #106, another validation failure is triggered by the devices entry.

Error: 
   0: error converting compose file
   1: error reading compose file
   2: File `/compose.yml` is not a valid compose file
   3: services.jellyfin.devices[0]: device must have a container path at line 45 column 9

Location:
   src/cli/compose.rs:203

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

The text was updated successfully, but these errors were encountered:

rany2 · 2024-09-16T14:33:17Z

For someone facing this issue, the following workaround seems like it works OK.

Define a new runtime at /etc/containers/containers.conf.d/50-nvidia-runtime.conf:

[engine.runtimes]
nvidia = ["/usr/bin/nvidia-container-runtime"]

Use runtime: nvidia in the compose service instead of the CDI device.

  jellyfin:
    image: docker.io/jellyfin/jellyfin:latest
    container_name: jellyfin
    restart: always
    #user: 973:973  # media:media
    runtime: nvidia
    group_add:
      - video
    ports:
      - 127.0.0.1:8096:8096
    volumes:
      - ./jellyfin/config:/config
      - ./jellyfin/cache:/cache
      - /mnt/hdd/media:/data/media
    security_opt:
      - label=disable
    labels:
      - io.containers.autoupdate=registry

I haven't tested the generate quadlet service but it returns the following which seems correct (ignore the volume paths, I didn't pass --absolute-host-paths):

# jellyfin.container
[Container]
AutoUpdate=registry
ContainerName=jellyfin
Image=docker.io/jellyfin/jellyfin:latest
PodmanArgs=--group-add video
PublishPort=127.0.0.1:8096:8096
SecurityLabelDisable=true
Volume=./jellyfin/config:/config
Volume=./jellyfin/cache:/cache
Volume=/mnt/hdd/media:/data/media
GlobalArgs=--runtime nvidia

[Service]
Restart=always

k9withabone · 2024-09-21T06:21:04Z

According to the Compose Specification, devices must be in the form HOST_PATH:CONTAINER_PATH[:CGROUP_PERMISSIONS].

k9withabone · 2024-09-21T06:30:02Z

Specifically for Podman, there is podman run --gpus (added in Podman v5.0.0), so you could add PodmanArgs=--gpus all to the generated .container Quadlet file.

rany2 · 2024-09-21T09:23:17Z

According to the Compose Specification, devices must be in the form HOST_PATH:CONTAINER_PATH[:CGROUP_PERMISSIONS].

Shouldn't the spec be corrected given that CDI devices exist? I think CDI devices are a relatively recent standard (not older than 5 years) and it's only very recently that Nvidia started recommending it for Podman users. It seems like a case of the spec being out of date.

Docker also supports CDI devices but I'm not sure if their docker-compose is doing this same type of validation.

IMO it should be valid given that both podman run and docker run accept it as valid.

rany2 · 2024-09-21T09:30:51Z

Specifically for Podman, there is podman run --gpus (added in Podman v5.0.0), so you could add PodmanArgs=--gpus all to the generated .container Quadlet file.

I actually preferred the runtime approach as it doesn't require me to create some kind of package update hook/systemd service that keeps the CDI yaml file up-to-date. The issue with CDI is that the file needs to be updated everytime Cuda or the Nvidia driver is updated.

Either way, this issue doesn't impact me anymore but I kept the issue open as it seems a simple issue to fix. Someone might need CDI devices for some other vendor and wouldn't be able to use the runtime workaround.

(Edit: --gpus=all just adds the Nvidia CDI devices behind the scenes. containers/podman#21180)

k9withabone · 2024-09-21T23:43:09Z

Thanks for the information! I haven't tried to use a GPU in a container myself and hadn't heard of CDI before.

Shouldn't the spec be corrected given that CDI devices exist?

Probably. You should create an issue in the compose-spec repo since you understand this better than I do.

IMO it should be valid given that both podman run and docker run accept it as valid.

Is there documentation on this? I can't find anything about CDI in the docker-run(1) or podman-run(1) man pages.

rany2 · 2024-09-22T07:37:58Z

Is there documentation on this? I can't find anything about CDI in the docker-run(1) or podman-run(1) man pages.

In the podman-run man page, the reference to CDI devices is subtle:

--device=host-device[:container-device][:permissions]

With CDI devices, container-device and permissions needs to be omitted. It is strange it isn't mentioned more directly though.

rany2 · 2024-09-22T10:20:47Z

I made a ticket here: compose-spec/compose-spec#532

rany2 mentioned this issue Sep 22, 2024

Devices definition incompatible with CDI compose-spec/compose-spec#532

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't convert compose service with CDI device #107

Can't convert compose service with CDI device #107

rany2 commented Sep 15, 2024

rany2 commented Sep 16, 2024 •

edited

Loading

k9withabone commented Sep 21, 2024

k9withabone commented Sep 21, 2024

rany2 commented Sep 21, 2024 •

edited

Loading

rany2 commented Sep 21, 2024 •

edited

Loading

k9withabone commented Sep 21, 2024

rany2 commented Sep 22, 2024

rany2 commented Sep 22, 2024

Can't convert compose service with CDI device #107

Can't convert compose service with CDI device #107

Comments

rany2 commented Sep 15, 2024

rany2 commented Sep 16, 2024 • edited Loading

k9withabone commented Sep 21, 2024

k9withabone commented Sep 21, 2024

rany2 commented Sep 21, 2024 • edited Loading

rany2 commented Sep 21, 2024 • edited Loading

k9withabone commented Sep 21, 2024

rany2 commented Sep 22, 2024

rany2 commented Sep 22, 2024

rany2 commented Sep 16, 2024 •

edited

Loading

rany2 commented Sep 21, 2024 •

edited

Loading

rany2 commented Sep 21, 2024 •

edited

Loading