Add a Dockerfile for AMD ROCm #3750

dark-penguin · 2025-02-07T17:22:20Z

Description

Provide a Dockerfile for AMD ROCm. Finding a good image is not trivial because unlike the Torch image for CUDA, the image for ROCm is 71 GB for whatever reason.

Additionally, having a Dockerfile that "works" is a great reference for when you are trying to install something on bare metal.

Notes

Build with: docker build -t sdnext -f Dockerfile.rocm .
Run with (example): docker run -it --rm --device /dev/dri --group-add video -v /sdnext:/mnt -p 7860:7860 sdnext

--device /dev/dri - that's the way to "mount" the graphics card devices into the container (instead of NVidia Toolkit)
--group-add video - the user inside the container needs access to that device
-v /sdnext:/mnt - mount a volume or a directory to keep persistent data
-p 7860:7860 - publish the port

The Dockerfile is made with minimal changes from the "official" NVidia Dockerfile to minimize the difference.

The Torch image for ROCm is 71 GB for some reason, so one difference I had to make is use a smaller image with only the essentials of ROCm installed (3 GB). Torch will be installed at buildtime (~2 GB download size). Total size of the built image is 23 GB (apparently Torch is packed really well).

Environment and Testing

Tested on Debian 12 Bookworm (I had to remove the --skip-all option from the CMD while testing since it's currently broken in master).

merge dev

dark-penguin · 2025-02-07T17:23:18Z

Oops, I guess I should have opened the PR against the dev branch...

lbeltrame · 2025-02-10T18:47:36Z

Dockerfile.rocm

+LABEL org.opencontainers.image.licenses="AGPL-3.0"
+LABEL org.opencontainers.image.title="SD.Next"
+LABEL org.opencontainers.image.description="SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models"
+LABEL org.opencontainers.image.base.name="https://hub.docker.com/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime"


This doesn't seem correct here, does it? (given that this uses ROCm)

lbeltrame · 2025-02-11T16:53:46Z

You may want to add a comment at the top of the Dockerfile mentioning the *GFX_OVERRIDE (forgot the complete name) env variable, because it needs to be set in case people don't run an officially supported card.

dark-penguin · 2025-02-11T17:28:34Z

Good point, but that's up to @vladmandic I guess. A comment in the Dockerfile or a note in the Wiki?

vladmandic · 2025-02-11T19:59:39Z

a) yes, rocm overrides should be exposed as its quite a common thing.
b) if we're to include docker for anything except cuda, wiki page needs rewrite as well. adding dockerfile without that is pointless. https://github.com/vladmandic/sdnext/wiki/Docker

vladmandic · 2025-02-11T23:54:07Z

ok, i've pretty much rewritten https://github.com/vladmandic/sdnext/wiki/Docker so its not cuda specific
this pr should target this file, not create new one in root: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

Disty0 · 2025-02-12T11:22:36Z

Added Dockerfile.rocm: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

Went with different approach than Cuda because of flash atten.

We can save 30 gb of disk space after installing flash atten in rocm-complete image and share the venv with the smaller rocm runtime image.
venv can also be shared between different instances if you have multiple gpus.

Also using ubuntu 24 with python3.12 because onnxruntime-rocm needs python3.12.

If you want to make changes, please target the new file.

dark-penguin · 2025-02-13T09:29:36Z

Just for education, is there any reason to prefer jemalloc for ROCm and tcmalloc for CUDA?

The wiki page should probably mention that the ROCm image is enormous by itself and requires even more space to build flash-attn.

Does nvidia-toolkit support AMD/Intel GPUs? I thought it didn't, so you have to "mount" devices "the old way" (--device /dev/dri instead of --gpus all). And if someone wants to run everything in the container as a non-root user, which is strongly recommended at least in production for security reasons, then they would also have to add the in-container user to the video group - either in the Dockerfile or simply by launching with --group-add video.

Disty0 · 2025-02-13T10:38:47Z

JeMalloc, not really. Only difference is JeMalloc doesn't hold onto the unused memory like TCMalloc and releases it immediately instead.
IPEX image uses JeMalloc for ipexrun and i didn't change it to TCMalloc when converting IPEX dockerfile to ROCm.

Added Docker to AMD ROCm wiki. (This one uses the 1GB compressed / 3GB uncompressed ROCm image): https://github.com/vladmandic/sdnext/wiki/AMD-ROCm#docker-installation

AMD uses --device /dev/dri and --device /dev/kfd
Intel uses --device /dev/dri

I might look into running docker with a user instead of root. My dockerfile should be compatible with non-root user as it is built around venv instead of root.

dark-penguin · 2025-02-13T11:00:57Z

AMD uses --device /dev/dri and --device /dev/kfd

I've seen mentions of --device /dev/kfd, but failing to add it does not seem to make any difference, and even the render group (required to access /dev/kfd) is missing in this image IIRC. Do you know what is it really for and whether or not does SDNext actually use it?

A couple of inconsistencies:

Build instructions tag the image with sdnext/sdnext-rocm, and here you launch disty0/sdnext-rocm:latest, which might cause confusion - especially among those who do this for the first time.

Added Docker to AMD ROCm wiki. (This one uses the 1GB compressed / 3GB uncompressed ROCm image)

The image itself is 3 GB, but it does not have Torch installed. So, it contradicts quite some points here:

build process is very simple and fast, typically around ~1min

Build Image: takes between few seconds (using cached image) to ~1.5min (initial build) to complete

More like half an hour to download a 3 GB Docker image and a 2 GB Torch wheel and install it, unless your Internet connection is crazy fast. The resulting image is almost 20 GB, which is also worth mentioning because that means you need well over 20 GB to build and launch it.

Prerequisites: nVidia Container ToolKit to enable GPU support

Only for NVidia

"Run Local" instructions are NVidia-only. They would be quite different for anything else.

Disty0 · 2025-02-13T12:22:27Z

I've seen mentions of --device /dev/kfd

Required for rocm dev to build flash attention.

Build instructions tag the image with sdnext/sdnext-rocm, and here you launch disty0/sdnext-rocm:latest, which might cause confusion - especially among those who do this for the first time.

sdnext/sdnext-rocm is the custom image you have built yourself. disty0/sdnext-rocm is the prebuilt image from docker hub: https://hub.docker.com/r/disty0/sdnext-rocm/

More like half an hour to download a 3 GB Docker image and a 2 GB Torch wheel and install it, unless your Internet connection is crazy fast.

I don't think a 24 Mbps ADSL connection still relevant.

1.5 minute figure is for typical VPS with 500 Mbps average download speed.

The resulting image is almost 20 GB, which is also worth mentioning.

I can add the docker image + venv sizes to platform wikis.

ipex: 8 GB venv + 1.1 GB docker
rocm: 20 GB venv + 3.15 GB docker
openvino: 2.5 GB venv + 1.1 GB docker

dark-penguin · 2025-02-13T13:34:13Z

disty0/sdnext-rocm is the prebuilt image from docker hub

Oh, you have a prebuilt image now! ❤️

I can add the docker image + venv sizes to platform wikis

I believe that would be beneficial.

1.5 minute figure is for typical VPS with 500 Mbps average download speed

Do most people run it on a VPS?
My reasoning is: a typical VPS does not have a graphics card, so most people would run it on their PC. The average home Internet speed is probably around 100 Mbps - sometimes 200, sometimes 50. Which means 10 minutes on average, from 5 to 20 minutes in general. So 1 minute for a 500-Mbps VPS is actually "crazy fast". On top of that, there are other factors - you can't always download stuff at your full theoretical connection speed. On my PC (probably about average), only unpacking and installing a cached ROCm Torch wheel takes more than a minute IIRC. So I think it's better to either not mention the "about 1 minute" part at all, or say something like "up to 10 minutes or more unless you are well connected". 🙂

Thanks for clearing up the /dev/kfd mystery! ❤️

vladmandic · 2025-02-13T14:17:34Z

i'll make edits to docker wiki page
update: done - check now and propose edits as needed.

dark-penguin · 2025-02-13T15:20:05Z

Looks good now! I guess we can close this PR.

vladmandic and others added 2 commits February 5, 2025 08:26

Merge pull request vladmandic#3745 from vladmandic/dev

9f12223

merge dev

Add a Dockerfile for AMD ROCm

fd28bd6

vladmandic changed the base branch from master to dev February 8, 2025 21:19

Merge branch 'dev' into contribute

0a61552

lbeltrame reviewed Feb 10, 2025

View reviewed changes

Fix the image metadata

257cbd2

vladmandic closed this Feb 13, 2025

dark-penguin deleted the contribute branch February 13, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Dockerfile for AMD ROCm #3750

Add a Dockerfile for AMD ROCm #3750

dark-penguin commented Feb 7, 2025

dark-penguin commented Feb 7, 2025

lbeltrame Feb 10, 2025

dark-penguin Feb 10, 2025

lbeltrame commented Feb 11, 2025

dark-penguin commented Feb 11, 2025

vladmandic commented Feb 11, 2025

vladmandic commented Feb 11, 2025

Disty0 commented Feb 12, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

Disty0 commented Feb 13, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

Disty0 commented Feb 13, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

vladmandic commented Feb 13, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025

Add a Dockerfile for AMD ROCm #3750

Add a Dockerfile for AMD ROCm #3750

Conversation

dark-penguin commented Feb 7, 2025

Description

Notes

Environment and Testing

dark-penguin commented Feb 7, 2025

lbeltrame Feb 10, 2025

Choose a reason for hiding this comment

dark-penguin Feb 10, 2025

Choose a reason for hiding this comment

lbeltrame commented Feb 11, 2025

dark-penguin commented Feb 11, 2025

vladmandic commented Feb 11, 2025

vladmandic commented Feb 11, 2025

Disty0 commented Feb 12, 2025 • edited Loading

dark-penguin commented Feb 13, 2025 • edited Loading

Disty0 commented Feb 13, 2025 • edited Loading

dark-penguin commented Feb 13, 2025 • edited Loading

Disty0 commented Feb 13, 2025 • edited Loading

dark-penguin commented Feb 13, 2025 • edited Loading

vladmandic commented Feb 13, 2025 • edited Loading

dark-penguin commented Feb 13, 2025

Disty0 commented Feb 12, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

Disty0 commented Feb 13, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

Disty0 commented Feb 13, 2025 •

edited

Loading

dark-penguin commented Feb 13, 2025 •

edited

Loading

vladmandic commented Feb 13, 2025 •

edited

Loading