[CI/Build] Dockerfile build for ARM64 / GH200 #10499

drikster80 · 2024-11-20T19:11:19Z

Updates the Dockerfile with $TARGETPLATFORM conditionals that will compile the necessary modules and extensions for aarch64 / ARM64 systems. This has been tested on the Nvidia GH200 platform.

Docker builds should use --platform "linux/arm64" to trigger the arm64 build process.

FIX #2021

Changes Overview:

Added an additional requirements-cuda-arm64.txt that uses the pytorch nightly modules that are compatible with ARM64+CUDA. This is temporary until they are moved to stable release (at which time this file can be removed).
Updates the current requirements files to not use torch/torchvision if platform_machine != 'aarch64'.
Uses conditionals to determine whether to build specific modules that are not currently shipped with aarch64 wheels.
Updated docs with notes and example command.

The following command was used to build and confirmed working on Nvidia GH200:

# Build time: ~40 min
# Max memory usage: 180GB
sudo docker build . --target vllm-openai --platform "linux/arm64" -t drikster80/vllm-gh200-openai:v0.6.4.post1 --build-arg max_jobs=66 --build-arg nvcc_threads=2 --build-arg torch_cuda_arch_list="9.0+PTX" --build-arg vllm_fa_cmake_gpu_arches="90-real" --build-arg RUN_WHEEL_CHECK='false'

NOTE: The order of the installing the requirements-cuda-arm64.txt is important since it needs to stomp over the currently installed torch version that are dependencies to other modules.

    if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
        pip uninstall -y torch && \
        python3 -m pip install -r requirements-cuda-arm64.txt; \
    fi

github-actions · 2024-11-20T19:11:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: drikster80 <[email protected]>

drikster80 · 2024-11-20T21:40:24Z

Missed a sign-off on 1 commit, so rebased and force-pushed to pass the DCO check.

Signed-off-by: drikster80 <[email protected]>

drikster80 · 2024-11-20T21:51:04Z

Noticed a bug where flashinfer x86_64 wheel was not installing by default. Since this was the default behavior on non-arm64 systems before, updated the conditional to always apply unless the target platform is specified as 'linux/arm64'.

youkaichao · 2024-11-26T19:34:59Z

Dockerfile

+    if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
+        apt-get update && apt-get install zlib1g-dev && \
+        python3 -m pip install packaging pybind11 && \
+        git clone https://github.com/openai/triton && \


can we directly use pytorch nightly as base image so that we don't need to build triton, etc?

I'm confused. Triton doesn't provide aarch64 whl files, so we'll always need to compile it if we want to use the latest version: https://pypi.org/project/triton/#files

It probably is a good idea to pin to the latest release tag of triton, instead of the main though. I'll update that.

My goal on this was to keep it as close as possible to the x86_64 implementation of VLLM, so I didn't want to use the nvidia pytorch container. That's what I was doing in the previous repo. Although it worked, it doubled the size of the final image (9.74GB vs 4.89GB).

youkaichao · 2024-11-26T19:36:15Z

Dockerfile

+RUN --mount=type=cache,target=/root/.cache/pip \
+    --mount=type=bind,source=.git,target=.git \
+    if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
+        pip --verbose wheel --use-pep517 --no-deps -w /workspace/dist --no-build-isolation git+https://github.com/vllm-project/flash-attention.git ; \


the vllm build already includes vllm-flash-attention

I believe torch version should be unpinned from the source in CMakeList.txx, setup.py and pyproject.toml

the vllm build already includes vllm-flash-attention

Ah, good point. I'll remove that and test.

youkaichao · 2024-11-27T08:49:05Z

@drikster80 overall it makes sense to me, but we don't need to build so many things in the docker. Just use the default should be fine, it already comes with flash-attention backend.

we don't need to build flashinfer / bitsandbytes / triton .

kushanam · 2024-12-04T05:32:40Z

requirements-cuda-arm64.txt

@@ -0,0 +1,3 @@
+--index-url https://download.pytorch.org/whl/nightly/cu124
+torchvision; platform_machine == 'aarch64'
+torch; platform_machine == 'aarch64'


You can add xformers for aarch64 to the /vllm-project directory similar to flash-attention for the aarch64 build until the upstream pip package is available

kushanam · 2024-12-04T05:39:21Z

Dockerfile

+RUN --mount=type=cache,target=/root/.cache/pip \
+    --mount=type=bind,source=.git,target=.git \
+    if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
+        pip --verbose wheel --use-pep517 --no-deps -w /workspace/dist --no-build-isolation git+https://github.com/vllm-project/flash-attention.git ; \


I believe torch version should be unpinned from the source in CMakeList.txx, setup.py and pyproject.toml

drikster80 · 2024-12-04T15:42:35Z

@drikster80 overall it makes sense to me, but we don't need to build so many things in the docker. Just use the default should be fine, it already comes with flash-attention backend.

we don't need to build flashinfer / bitsandbytes / triton .

None of these have aarch64 whl. When you say "use the default", are these all built into vllm as well? When I attempt to run the container without building these, it fails.

youkaichao · 2024-12-04T19:33:52Z

the goal here is to have a runnable image for vllm on arm64 / GH200 . we don't need to have full features here. since the community is not fully ready for arm64, it would be a maintenance disaster if we build so many things here by ourselves. if a library does not support arm64, people should reach out to that library and let that library be compatible with arm64.

that's why I want to use pytorch nightly docker directly. docker image size is not my concern.

My goal on this was to keep it as close as possible to the x86_64 implementation of VLLM

this is not my goal. the first step is we can run vllm serve meta-llama/Llama-3.1-8B on GH200 , that's good enough.

drikster80 · 2024-12-04T19:55:13Z

the goal here is to have a runnable image for vllm on arm64 / GH200 . we don't need to have full features here. since the community is not fully ready for arm64, it would be a maintenance disaster if we build so many things here by ourselves. if a library does not support arm64, people should reach out to that library and let that library be compatible with arm64.

that's why I want to use pytorch nightly docker directly. docker image size is not my concern.

My goal on this was to keep it as close as possible to the x86_64 implementation of VLLM

this is not my goal. the first step is we can run vllm serve meta-llama/Llama-3.1-8B on GH200 , that's good enough.

Okay, it sounds like our goals just weren't aligned. I agree it could become a maintainability issue this way. FWIW, it looks like the other libraries do support ARM64, but don't provide a whl for them on pypi (probably due to GitHub Actions limitations). I'll create tickets on the other repos requesting the aarch64 whl be build/provided.

I had originally moved away from using the nvidia-pytorch container because they were slower at updating torch than VLLM was. It looks like they just came out with a version compatible with torch v2.6, so I can try to use that version.

In the meantime, I'll continue maintaining the fork and hosting a full-featured version under my docker-hub that matches the releases of VLLM.

youkaichao · 2024-12-04T22:31:58Z

I had originally moved away from using the nvidia-pytorch container because they were slower at updating torch than VLLM was.

We don't need nvidia-pytorch container. A basic nvidia container is good enough, and we can just install nightly pytorch wheels.

In the meantime, I'll continue maintaining the fork and hosting a full-featured version under my docker-hub that matches the releases of VLLM.

thanks for your efforts! for this PR, let's get the basic support first 👍

[CI/Build] Dockerfile build for ARM64 / GH200 vllm-project#10499 by cenzhiyao

youkaichao · 2024-12-16T22:04:57Z

close as #11212 has been merged. @drikster80 thanks for your efforts! please continue to keep your branch with full-fledged feature.

mergify bot added documentation Improvements or additions to documentation ci/build labels Nov 20, 2024

simon-mo self-assigned this Nov 20, 2024

This was referenced Nov 20, 2024

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

Closed

[Installation]: VLLM on ARM machine with GH200 #10459

Open

drikster80 added 2 commits November 20, 2024 21:37

Update Docker for aarch64 builds

d2cbe42

Signed-off-by: drikster80 <[email protected]>

Update docs for arm64 docker builds & GH200 example

29ed358

Signed-off-by: drikster80 <[email protected]>

drikster80 force-pushed the main branch from a09b2f7 to 29ed358 Compare November 20, 2024 21:37

Fix FLASHINFER not installing by default on x86_64

f2635a4

Signed-off-by: drikster80 <[email protected]>

youkaichao reviewed Nov 26, 2024

View reviewed changes

kushanam reviewed Dec 4, 2024

View reviewed changes

Removed build of flash-attn. Added libnccl2

ccef455

This was referenced Dec 8, 2024

Pypi build for aarch64 triton-lang/triton#4978

Open

aarch64 whl in PyPi bitsandbytes-foundation/bitsandbytes#1437

Open

cennn added a commit to cennn/vllm that referenced this pull request Dec 14, 2024

Merge pull request #1 from drikster80/main

7c2f32e

[CI/Build] Dockerfile build for ARM64 / GH200 vllm-project#10499 by cenzhiyao

This was referenced Dec 15, 2024

[CI/Build] simplify Dockerfile build for ARM64 / GH200 #11211

Closed

[CI/Build] simplify Dockerfile build for ARM64 / GH200 #11212

Merged

youkaichao closed this Dec 16, 2024

JordanNanos mentioned this pull request Dec 17, 2024

[Installation]: Installing vllm in GH200 machine (aarch64) causes problems with cusparse.h missing #11191

Open

1 task

cennn mentioned this pull request Dec 18, 2024

[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/Build] Dockerfile build for ARM64 / GH200 #10499

[CI/Build] Dockerfile build for ARM64 / GH200 #10499

drikster80 commented Nov 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 20, 2024

drikster80 commented Nov 20, 2024

drikster80 commented Nov 20, 2024

youkaichao Nov 26, 2024

drikster80 Dec 4, 2024

youkaichao Nov 26, 2024

kushanam Dec 4, 2024

drikster80 Dec 4, 2024

youkaichao commented Nov 27, 2024

kushanam Dec 4, 2024

kushanam Dec 4, 2024

drikster80 commented Dec 4, 2024

youkaichao commented Dec 4, 2024

drikster80 commented Dec 4, 2024

youkaichao commented Dec 4, 2024

youkaichao commented Dec 16, 2024

[CI/Build] Dockerfile build for ARM64 / GH200 #10499

[CI/Build] Dockerfile build for ARM64 / GH200 #10499

Conversation

drikster80 commented Nov 20, 2024 • edited by github-actions bot Loading

Changes Overview:

github-actions bot commented Nov 20, 2024

drikster80 commented Nov 20, 2024

drikster80 commented Nov 20, 2024

youkaichao Nov 26, 2024

Choose a reason for hiding this comment

drikster80 Dec 4, 2024

Choose a reason for hiding this comment

youkaichao Nov 26, 2024

Choose a reason for hiding this comment

kushanam Dec 4, 2024

Choose a reason for hiding this comment

drikster80 Dec 4, 2024

Choose a reason for hiding this comment

youkaichao commented Nov 27, 2024

kushanam Dec 4, 2024

Choose a reason for hiding this comment

kushanam Dec 4, 2024

Choose a reason for hiding this comment

drikster80 commented Dec 4, 2024

youkaichao commented Dec 4, 2024

drikster80 commented Dec 4, 2024

youkaichao commented Dec 4, 2024

youkaichao commented Dec 16, 2024

drikster80 commented Nov 20, 2024 •

edited by github-actions bot

Loading