-
Notifications
You must be signed in to change notification settings - Fork 36
Add Dockerfiles for inference engines #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
So contributors can further enhance, they may want to implement/enable intel's oneapi, musa, etc. Signed-off-by: Eric Curtin <[email protected]>
Reviewer's GuideThis PR adds comprehensive build automation and containerization support for the llama.cpp inference engine by introducing an OS-aware Makefile, CMake configurations, a GPU info utility, helper scripts, various Dockerfiles, and accompanying documentation. Class diagram for nv-gpu-info utility and NVAPI usageclassDiagram
class nv_gpu_info {
+main()
-NvAPI_Status status
-NvAPI_ShortString error_str
-NvU32 driver_version
-NvAPI_ShortString build_branch
-NV_PHYSICAL_GPUS_V1 nvPhysicalGPUs
-NvPhysicalGpuHandle gpu
-NvAPI_ShortString gpu_name
-NvU32 devid
-NvU32 subsysid
-NvU32 revid
-NvU32 extid
-NV_GPU_MEMORY_INFO_EX_V1 nvMemoryInfo
}
class NvAPI {
+Initialize()
+GetErrorMessage(status, error_str)
+SYS_GetDriverAndBranchVersion(driver_version, build_branch)
+SYS_GetPhysicalGPUs(nvPhysicalGPUs)
+GPU_GetFullName(gpu, gpu_name)
+GPU_GetPCIIdentifiers(gpu, devid, subsysid, revid, extid)
+GPU_GetMemoryInfoEx(gpu, nvMemoryInfo)
}
nv_gpu_info --> NvAPI
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's ability to support diverse inference engine deployments by introducing Dockerfiles and build configurations for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Docker infrastructure for building llama.cpp inference engines with support for different acceleration backends (CPU, CUDA, Vulkan). The primary purpose is to enable contributors to implement and enhance support for various hardware acceleration platforms like Intel OneAPI, MUSA, etc.
- Adds Docker build configurations for generic CPU and CUDA-accelerated builds
- Includes native build system with CMake configuration and platform-specific installation scripts
- Provides utilities for NVIDIA GPU information gathering on Windows
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
inference-engine/llamacpp/promote-rc.Dockerfile |
Simple promotion Dockerfile for release candidates |
inference-engine/llamacpp/native/generic.Dockerfile |
Multi-stage build for CPU/Vulkan-accelerated inference engine |
inference-engine/llamacpp/native/cuda.Dockerfile |
CUDA-accelerated build using NVIDIA base images |
inference-engine/llamacpp/native/install-*.sh |
Installation scripts for Vulkan and Clang dependencies |
inference-engine/llamacpp/native/src/nv-gpu-info/ |
Windows utility for NVIDIA GPU information gathering |
inference-engine/llamacpp/native/CMakeLists.txt |
CMake configuration for building the inference server |
inference-engine/llamacpp/Makefile |
Cross-platform build system (macOS implementation) |
inference-engine/llamacpp/Dockerfile |
Main Dockerfile for copying artifacts from build stages |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
status = NvAPI_Initialize(); | ||
if (status != NVAPI_OK) { | ||
NvAPI_GetErrorMessage(status, error_str); | ||
printf("Failed to initialise NVAPI: %s\n", error_str); |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'initialise' to 'initialize'.
printf("Failed to initialise NVAPI: %s\n", error_str); | |
printf("Failed to initialize NVAPI: %s\n", error_str); |
Copilot uses AI. Check for mistakes.
|
||
This repo contains implementations of the llama.cpp inference runtime. | ||
|
||
* native/ - contains an implementaion based on `llama.cpp`'s native server |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'implementaion' to 'implementation'.
* native/ - contains an implementaion based on `llama.cpp`'s native server | |
* native/ - contains an implementation based on `llama.cpp`'s native server |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The Makefile only implements the macOS build path while Linux and Windows targets exit immediately—consider adding at least stub Linux support or clearer disable flags to avoid surprise failures.
- The manual
.git
submodule patching in the Dockerfiles is brittle; using a propergit submodule update --init --recursive
or cloning with--recurse-submodules
would simplify maintenance. - Generic and CUDA Dockerfiles share many identical steps—consolidate common instructions into a base image or use build arguments to reduce duplication and ease future changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Makefile only implements the macOS build path while Linux and Windows targets exit immediately—consider adding at least stub Linux support or clearer disable flags to avoid surprise failures.
- The manual `.git` submodule patching in the Dockerfiles is brittle; using a proper `git submodule update --init --recursive` or cloning with `--recurse-submodules` would simplify maintenance.
- Generic and CUDA Dockerfiles share many identical steps—consolidate common instructions into a base image or use build arguments to reduce duplication and ease future changes.
## Individual Comments
### Comment 1
<location> `inference-engine/llamacpp/native/install-clang.sh:6` </location>
<code_context>
+main() {
+ set -eux -o pipefail
+
+ apt-get update && apt-get install -y cmake ninja-build git wget gnupg2
+ wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Missing noninteractive flags may cause issues in CI environments.
Add DEBIAN_FRONTEND=noninteractive to apt-get install to prevent prompts during automated builds.
```suggestion
apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y cmake ninja-build git wget gnupg2
```
</issue_to_address>
### Comment 2
<location> `inference-engine/llamacpp/native/install-vulkan.sh:6` </location>
<code_context>
+main() {
+ set -eux -o pipefail
+
+ apt-get install -y glslc libvulkan-dev
+}
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Missing apt-get update before install may cause package issues.
Add 'apt-get update' before installing packages to avoid outdated package lists.
```suggestion
apt-get update
apt-get install -y glslc libvulkan-dev
```
</issue_to_address>
### Comment 3
<location> `inference-engine/llamacpp/native/generic.Dockerfile:9` </location>
<code_context>
+
+ARG TARGETARCH
+
+RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl
+
+COPY native/install-vulkan.sh .
</code_context>
<issue_to_address>
**suggestion (performance):** No cleanup of apt cache after installation.
Add 'rm -rf /var/lib/apt/lists/*' after installing packages to minimize image size and prevent outdated package lists.
```suggestion
RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>
### Comment 4
<location> `inference-engine/llamacpp/README.md:5` </location>
<code_context>
+
+This repo contains implementations of the llama.cpp inference runtime.
+
+* native/ - contains an implementaion based on `llama.cpp`'s native server
+ implementation
</code_context>
<issue_to_address>
**issue (typo):** Typo: 'implementaion' should be 'implementation'.
Change 'implementaion' to 'implementation' in the directory description.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
main() { | ||
set -eux -o pipefail | ||
|
||
apt-get update && apt-get install -y cmake ninja-build git wget gnupg2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Missing noninteractive flags may cause issues in CI environments.
Add DEBIAN_FRONTEND=noninteractive to apt-get install to prevent prompts during automated builds.
apt-get update && apt-get install -y cmake ninja-build git wget gnupg2 | |
apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y cmake ninja-build git wget gnupg2 |
main() { | ||
set -eux -o pipefail | ||
|
||
apt-get install -y glslc libvulkan-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Missing apt-get update before install may cause package issues.
Add 'apt-get update' before installing packages to avoid outdated package lists.
apt-get install -y glslc libvulkan-dev | |
apt-get update | |
apt-get install -y glslc libvulkan-dev |
|
||
ARG TARGETARCH | ||
|
||
RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): No cleanup of apt cache after installation.
Add 'rm -rf /var/lib/apt/lists/*' after installing packages to minimize image size and prevent outdated package lists.
RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl | |
RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl && rm -rf /var/lib/apt/lists/* |
|
||
This repo contains implementations of the llama.cpp inference runtime. | ||
|
||
* native/ - contains an implementaion based on `llama.cpp`'s native server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): Typo: 'implementaion' should be 'implementation'.
Change 'implementaion' to 'implementation' in the directory description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Dockerfiles and build scripts for creating inference engine artifacts for llama.cpp
. The changes are well-structured, setting up builds for different environments like CUDA, generic CPU/Vulkan, and local macOS development. My review includes suggestions for optimizing the Dockerfiles by reducing layers, improving script robustness, enhancing build script flexibility for different architectures, and fixing a minor typo in the documentation. I've also pointed out an opportunity to refactor some C code for better maintainability.
main() { | ||
set -eux -o pipefail | ||
|
||
apt-get install -y glslc libvulkan-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a best practice to run apt-get update
before apt-get install
to ensure you are getting the latest package lists. While the calling Dockerfile might do this, making the script self-contained improves robustness and prevents potential build failures due to stale package lists.
apt-get install -y glslc libvulkan-dev | |
apt-get update && apt-get install -y glslc libvulkan-dev |
|
||
add_library(nvapi STATIC IMPORTED) | ||
set_target_properties(nvapi PROPERTIES | ||
IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/vendor/nvapi/amd64/nvapi64.lib" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARG TARGETOS | ||
ARG TARGETARCH | ||
ARG ACCEL | ||
COPY --from=release-artifacts /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destination path for the COPY
command is the same as the source directory name. This will result in a deeply nested and cumbersome path for the executables inside the final image (e.g., /com.docker.llama-server.native.linux.cpu.amd64/bin/com.docker.llama-server
). It's generally better to copy the contents of the artifact directory into a standard location like /
to make the binaries easier to locate and execute. Adding a trailing slash to the source path will copy the directory's contents.
COPY --from=release-artifacts /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH/ /
-DCMAKE_CXX_COMPILER=clang++ \ | ||
-DCMAKE_C_COMPILER=clang \ | ||
-DCMAKE_BUILD_TYPE=Release \ | ||
-DCMAKE_OSX_DEPLOYMENT_TARGET=13.3 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
This repo contains implementations of the llama.cpp inference runtime. | ||
|
||
* native/ - contains an implementaion based on `llama.cpp`'s native server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN rm install/bin/*.py | ||
RUN rm -r install/lib/cmake | ||
RUN rm -r install/lib/pkgconfig | ||
RUN rm -r install/include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN echo "-B build \ | ||
-DCMAKE_BUILD_TYPE=Release \ | ||
-DGGML_NATIVE=OFF \ | ||
-DGGML_OPENMP=OFF \ | ||
-DLLAMA_CURL=OFF \ | ||
-DGGML_VULKAN=ON \ | ||
-GNinja \ | ||
-S ." > cmake-flags | ||
RUN if [ "${TARGETARCH}" = "amd64" ]; then \ | ||
echo " -DBUILD_SHARED_LIBS=ON \ | ||
-DGGML_BACKEND_DL=ON \ | ||
-DGGML_CPU_ALL_VARIANTS=ON" >> cmake-flags; \ | ||
elif [ "${TARGETARCH}" = "arm64" ]; then \ | ||
echo " -DBUILD_SHARED_LIBS=OFF" >> cmake-flags; \ | ||
else \ | ||
echo "${TARGETARCH} is not supported"; \ | ||
exit 1; \ | ||
fi | ||
RUN cmake $(cat cmake-flags) | ||
RUN cmake --build build --config Release -j 4 | ||
RUN cmake --install build --config Release --prefix install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To optimize the Docker image by reducing layers, the logic for setting CMake flags, configuring, building, and installing can be combined into a single RUN
instruction. This avoids creating multiple layers and temporary files like cmake-flags
.
RUN CMAKE_FLAGS="-B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_OPENMP=OFF -DLLAMA_CURL=OFF -DGGML_VULKAN=ON -GNinja -S ." && \
if [ "${TARGETARCH}" = "amd64" ]; then \
CMAKE_FLAGS="$CMAKE_FLAGS -DBUILD_SHARED_LIBS=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON"; \
elif [ "${TARGETARCH}" = "arm64" ]; then \
CMAKE_FLAGS="$CMAKE_FLAGS -DBUILD_SHARED_LIBS=OFF"; \
else \
echo "${TARGETARCH} is not supported"; \
exit 1; \
fi && \
cmake $CMAKE_FLAGS && \
cmake --build build --config Release -j 4 && \
cmake --install build --config Release --prefix install
RUN rm install/bin/*.py | ||
RUN rm -r install/lib/cmake | ||
RUN rm -r install/lib/pkgconfig | ||
RUN rm -r install/include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -eux -o pipefail | ||
|
||
apt-get update && apt-get install -y cmake ninja-build git wget gnupg2 | ||
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tee
command writes the GPG key to the specified file but also to standard output, which is not necessary here. You can redirect the standard output of tee
to /dev/null
to keep the build logs cleaner.
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc | |
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc > /dev/null |
status = NvAPI_Initialize(); | ||
if (status != NVAPI_OK) { | ||
NvAPI_GetErrorMessage(status, error_str); | ||
printf("Failed to initialise NVAPI: %s\n", error_str); | ||
return -1; | ||
} | ||
|
||
NvU32 driver_version; | ||
NvAPI_ShortString build_branch; | ||
|
||
status = NvAPI_SYS_GetDriverAndBranchVersion(&driver_version, build_branch); | ||
if (status != NVAPI_OK) { | ||
NvAPI_GetErrorMessage(status, error_str); | ||
printf("Failed to retrieve driver info: %s\n", error_str); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern for checking NvAPI_Status
and printing an error message is repeated multiple times throughout the file. To improve code maintainability and reduce duplication, consider extracting this logic into a helper function. For example:
bool check_nvapi_status(NvAPI_Status status, const char* message) {
if (status != NVAPI_OK) {
NvAPI_ShortString error_str = { 0 };
NvAPI_GetErrorMessage(status, error_str);
printf("%s: %s\n", message, error_str);
return false;
}
return true;
}
So contributors can further enhance, they may want to implement/enable intel's oneapi, musa, etc.
Summary by Sourcery
Add Docker-based build and deployment support for llama.cpp inference engines and native server, including CPU (with Vulkan) and CUDA variants, along with local build infrastructure and GPU utility
New Features:
Enhancements:
Build:
Documentation: