Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Dockerfile for AMD ROCm #3750

Closed
wants to merge 4 commits into from

Conversation

dark-penguin
Copy link

Description

Provide a Dockerfile for AMD ROCm. Finding a good image is not trivial because unlike the Torch image for CUDA, the image for ROCm is 71 GB for whatever reason.

Additionally, having a Dockerfile that "works" is a great reference for when you are trying to install something on bare metal.

Notes

Build with: docker build -t sdnext -f Dockerfile.rocm .
Run with (example): docker run -it --rm --device /dev/dri --group-add video -v /sdnext:/mnt -p 7860:7860 sdnext

  • --device /dev/dri - that's the way to "mount" the graphics card devices into the container (instead of NVidia Toolkit)
  • --group-add video - the user inside the container needs access to that device
  • -v /sdnext:/mnt - mount a volume or a directory to keep persistent data
  • -p 7860:7860 - publish the port

The Dockerfile is made with minimal changes from the "official" NVidia Dockerfile to minimize the difference.

The Torch image for ROCm is 71 GB for some reason, so one difference I had to make is use a smaller image with only the essentials of ROCm installed (3 GB). Torch will be installed at buildtime (~2 GB download size). Total size of the built image is 23 GB (apparently Torch is packed really well).

Environment and Testing

Tested on Debian 12 Bookworm (I had to remove the --skip-all option from the CMD while testing since it's currently broken in master).

@dark-penguin
Copy link
Author

Oops, I guess I should have opened the PR against the dev branch...

@vladmandic vladmandic changed the base branch from master to dev February 8, 2025 21:19
Dockerfile.rocm Outdated
LABEL org.opencontainers.image.licenses="AGPL-3.0"
LABEL org.opencontainers.image.title="SD.Next"
LABEL org.opencontainers.image.description="SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models"
LABEL org.opencontainers.image.base.name="https://hub.docker.com/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct here, does it? (given that this uses ROCm)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@lbeltrame
Copy link
Contributor

You may want to add a comment at the top of the Dockerfile mentioning the *GFX_OVERRIDE (forgot the complete name) env variable, because it needs to be set in case people don't run an officially supported card.

@dark-penguin
Copy link
Author

Good point, but that's up to @vladmandic I guess. A comment in the Dockerfile or a note in the Wiki?

@vladmandic
Copy link
Owner

a) yes, rocm overrides should be exposed as its quite a common thing.
b) if we're to include docker for anything except cuda, wiki page needs rewrite as well. adding dockerfile without that is pointless. https://github.com/vladmandic/sdnext/wiki/Docker

@vladmandic
Copy link
Owner

ok, i've pretty much rewritten https://github.com/vladmandic/sdnext/wiki/Docker so its not cuda specific
this pr should target this file, not create new one in root: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

@Disty0
Copy link
Collaborator

Disty0 commented Feb 12, 2025

Added Dockerfile.rocm: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

Went with different approach than Cuda because of flash atten.

We can save 30 gb of disk space after installing flash atten in rocm-complete image and share the venv with the smaller rocm runtime image.
venv can also be shared between different instances if you have multiple gpus.

Also using ubuntu 24 with python3.12 because onnxruntime-rocm needs python3.12.

If you want to make changes, please target the new file.

@dark-penguin
Copy link
Author

dark-penguin commented Feb 13, 2025

Just for education, is there any reason to prefer jemalloc for ROCm and tcmalloc for CUDA?

The wiki page should probably mention that the ROCm image is enormous by itself and requires even more space to build flash-attn.

Does nvidia-toolkit support AMD/Intel GPUs? I thought it didn't, so you have to "mount" devices "the old way" (--device /dev/dri instead of --gpus all). And if someone wants to run everything in the container as a non-root user, which is strongly recommended at least in production for security reasons, then they would also have to add the in-container user to the video group - either in the Dockerfile or simply by launching with --group-add video.

@Disty0
Copy link
Collaborator

Disty0 commented Feb 13, 2025

JeMalloc, not really. Only difference is JeMalloc doesn't hold onto the unused memory like TCMalloc and releases it immediately instead.
IPEX image uses JeMalloc for ipexrun and i didn't change it to TCMalloc when converting IPEX dockerfile to ROCm.

Added Docker to AMD ROCm wiki. (This one uses the 1GB compressed / 3GB uncompressed ROCm image): https://github.com/vladmandic/sdnext/wiki/AMD-ROCm#docker-installation

AMD uses --device /dev/dri and --device /dev/kfd
Intel uses --device /dev/dri

I might look into running docker with a user instead of root. My dockerfile should be compatible with non-root user as it is built around venv instead of root.

@dark-penguin
Copy link
Author

dark-penguin commented Feb 13, 2025

AMD uses --device /dev/dri and --device /dev/kfd

I've seen mentions of --device /dev/kfd, but failing to add it does not seem to make any difference, and even the render group (required to access /dev/kfd) is missing in this image IIRC. Do you know what is it really for and whether or not does SDNext actually use it?

A couple of inconsistencies:

Build instructions tag the image with sdnext/sdnext-rocm, and here you launch disty0/sdnext-rocm:latest, which might cause confusion - especially among those who do this for the first time.

Added Docker to AMD ROCm wiki. (This one uses the 1GB compressed / 3GB uncompressed ROCm image)

The image itself is 3 GB, but it does not have Torch installed. So, it contradicts quite some points here:

build process is very simple and fast, typically around ~1min

Build Image: takes between few seconds (using cached image) to ~1.5min (initial build) to complete

More like half an hour to download a 3 GB Docker image and a 2 GB Torch wheel and install it, unless your Internet connection is crazy fast. The resulting image is almost 20 GB, which is also worth mentioning because that means you need well over 20 GB to build and launch it.

Prerequisites: nVidia Container ToolKit to enable GPU support

Only for NVidia

"Run Local" instructions are NVidia-only. They would be quite different for anything else.

@Disty0
Copy link
Collaborator

Disty0 commented Feb 13, 2025

I've seen mentions of --device /dev/kfd

Required for rocm dev to build flash attention.

Build instructions tag the image with sdnext/sdnext-rocm, and here you launch disty0/sdnext-rocm:latest, which might cause confusion - especially among those who do this for the first time.

sdnext/sdnext-rocm is the custom image you have built yourself. disty0/sdnext-rocm is the prebuilt image from docker hub: https://hub.docker.com/r/disty0/sdnext-rocm/

More like half an hour to download a 3 GB Docker image and a 2 GB Torch wheel and install it, unless your Internet connection is crazy fast.

I don't think a 24 Mbps ADSL connection still relevant.

1.5 minute figure is for typical VPS with 500 Mbps average download speed.

The resulting image is almost 20 GB, which is also worth mentioning.

I can add the docker image + venv sizes to platform wikis.

ipex: 8 GB venv + 1.1 GB docker
rocm: 20 GB venv + 3.15 GB docker
openvino: 2.5 GB venv + 1.1 GB docker

@dark-penguin
Copy link
Author

dark-penguin commented Feb 13, 2025

disty0/sdnext-rocm is the prebuilt image from docker hub

Oh, you have a prebuilt image now! ❤️

I can add the docker image + venv sizes to platform wikis

I believe that would be beneficial.

1.5 minute figure is for typical VPS with 500 Mbps average download speed

Do most people run it on a VPS?
My reasoning is: a typical VPS does not have a graphics card, so most people would run it on their PC. The average home Internet speed is probably around 100 Mbps - sometimes 200, sometimes 50. Which means 10 minutes on average, from 5 to 20 minutes in general. So 1 minute for a 500-Mbps VPS is actually "crazy fast". On top of that, there are other factors - you can't always download stuff at your full theoretical connection speed. On my PC (probably about average), only unpacking and installing a cached ROCm Torch wheel takes more than a minute IIRC. So I think it's better to either not mention the "about 1 minute" part at all, or say something like "up to 10 minutes or more unless you are well connected". 🙂

Thanks for clearing up the /dev/kfd mystery! ❤️

@vladmandic
Copy link
Owner

vladmandic commented Feb 13, 2025

i'll make edits to docker wiki page
update: done - check now and propose edits as needed.

@dark-penguin
Copy link
Author

Looks good now! I guess we can close this PR.

@vladmandic vladmandic closed this Feb 13, 2025
@dark-penguin dark-penguin deleted the contribute branch February 13, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants