Skip to content
This repository has been archived by the owner on Dec 13, 2024. It is now read-only.

Add nvidia container toolkit to Dockerfile #358

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nbbrooks
Copy link
Member

@nbbrooks nbbrooks commented Sep 2, 2024

Mujoco camera performance with an Nvidia GPU inside a Docker container is really bad. I am hoping this will fix it. Taken from Chance's initial work.

…o test Mujoco rendering performance with Nvidia GPU inside Docker container
@nbbrooks nbbrooks force-pushed the pr-add-nvidia-container-toolkit branch from 07222ab to f3bf74a Compare September 2, 2024 22:25
@nbbrooks
Copy link
Member Author

nbbrooks commented Sep 3, 2024

Since I have not had much luck getting this configured, I tried a simpler workflow (no CUDA, just nvidia drivers) and got the render speed up I wanted via https://github.com/PickNikRobotics/moveit_studio_ur_ws/tree/pr-add-dockerfile-with-nvidia-drivers

Unfortunately, I do not think we can just merge the nvidia-driver change into the main Dockerfile. Ironically, not because it will break things for non-nvidia users (I used the image built with nvidia drivers on an Intel integrated graphics laptop successfully), but because it will cause a segfault when launching Pro if the naive version installed by the Dockerfile (e.g. 555) does not match the version on your system (e.g. 535).

Let's talk about how we want to handle this - I don't love having to keep two Dockerfiles up to date. Perhaps initially, unless there is something simple I missed with nvidia-toolkit, we could just add the commented out entries for nvidia-driver installation to the primary Dockerfile with instructions in the Quick Start to uncomment the lines and sync the driver version if you have an Nvidia card and want high render rates in the Pro sim.

@@ -49,6 +49,25 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
/home/${USERNAME}/.ros && \
chown -R $USER_UID:$USER_GID /home/${USERNAME} /opt/overlay_ws/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this image we also need to install the nvidia-container-toolkit:

Suggested change
RUN curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# hadolint ignore=DL3008
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install --no-install-recommends -y software-properties-common wget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apt-get update && apt-get install --no-install-recommends -y software-properties-common wget
apt-get update && apt-get install --no-install-recommends -y software-properties-common wget nvidia-container-toolkit

@EzraBrooks
Copy link
Member

Unsure about how general of a fix it is, but I was able to help another customer get much better performance just by changing runtime: to nvidia in the docker-compose file, without modifying the Dockerfile.

@chancecardona
Copy link
Contributor

That is very interesting.

Yeah I think the solution to this is multistepped.

We probably need to provide a /etc/docker/daemon.json configuring the nvidia and cri-o options needed, in addition to the runtime in the docker compose as Ezra just mentioned. (nvidia-ctk can do some of this)

Overall I think the solution to this is to have:

  • nvidia container toolkit installed on the host
  • nvidia drivers installed on host (cuda toolkit not required)
  • configuring the files / daemons right (mentioned above)

@chancecardona
Copy link
Contributor

Here is an example of my ZFS + cuda enabled docker daemon.json:

{
    "default-runtime": "nvidia",
    "exec-opts": [
        "native.cgroupdriver=cgroupfs"
    ],
    "features": {
        "cdi": true
    },
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "builder": {"Entitlements": {"security-insecure": true }},
    "storage-driver": "zfs"
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants