Add Dockerfiles for inference engines #217

ericcurtin · 2025-10-13T11:33:34Z

So contributors can further enhance, they may want to implement/enable intel's oneapi, musa, etc.

Summary by Sourcery

Add Docker-based build and deployment support for llama.cpp inference engines and native server, including CPU (with Vulkan) and CUDA variants, along with local build infrastructure and GPU utility

New Features:

Add Dockerfiles for generic CPU (Vulkan-enabled), CUDA GPU, release promotion, and final deployment images
Introduce a Makefile for building the llama.cpp native server locally on macOS
Add nv-gpu-info utility and integrate it into CMake for querying NVIDIA GPU details

Enhancements:

Provide CMakeLists for project structure with configurable server and utility builds
Include install-vulkan.sh and install-clang.sh scripts to automate dependency setup

Build:

Introduce Dockerfiles and Makefile to streamline build and packaging of inference engines

Documentation:

Add README files for top-level and native directories with build and run instructions

So contributors can further enhance, they may want to implement/enable intel's oneapi, musa, etc. Signed-off-by: Eric Curtin <[email protected]>

sourcery-ai · 2025-10-13T11:33:42Z

Reviewer's Guide

This PR adds comprehensive build automation and containerization support for the llama.cpp inference engine by introducing an OS-aware Makefile, CMake configurations, a GPU info utility, helper scripts, various Dockerfiles, and accompanying documentation.

Class diagram for nv-gpu-info utility and NVAPI usage

classDiagram
  class nv_gpu_info {
    +main()
    -NvAPI_Status status
    -NvAPI_ShortString error_str
    -NvU32 driver_version
    -NvAPI_ShortString build_branch
    -NV_PHYSICAL_GPUS_V1 nvPhysicalGPUs
    -NvPhysicalGpuHandle gpu
    -NvAPI_ShortString gpu_name
    -NvU32 devid
    -NvU32 subsysid
    -NvU32 revid
    -NvU32 extid
    -NV_GPU_MEMORY_INFO_EX_V1 nvMemoryInfo
  }
  class NvAPI {
    +Initialize()
    +GetErrorMessage(status, error_str)
    +SYS_GetDriverAndBranchVersion(driver_version, build_branch)
    +SYS_GetPhysicalGPUs(nvPhysicalGPUs)
    +GPU_GetFullName(gpu, gpu_name)
    +GPU_GetPCIIdentifiers(gpu, devid, subsysid, revid, extid)
    +GPU_GetMemoryInfoEx(gpu, nvMemoryInfo)
  }
  nv_gpu_info --> NvAPI

File-Level Changes

Change	Details	Files
Implement OS-aware Makefile for llama.cpp builds	Detect host OS and set build/install directories Define macOS build target with CMake configuration and install steps Stub out Linux/Windows build and dependency targets Add install-deps, clean, install-dir, and help targets	`inference-engine/llamacpp/Makefile`
Introduce CMake configuration for native server	Define project and output directories Expose build options for server and utilities Include llama.cpp submodules and server directory Conditionally add nv-gpu-info on Windows	`inference-engine/llamacpp/native/CMakeLists.txt`
Add NVIDIA GPU info utility	Implement nv-gpu-info.c to initialize NVAPI and query driver/GPU details Format and print GPU identifiers and memory info Configure imported nvapi library and executable target	`inference-engine/llamacpp/native/src/nv-gpu-info/nv-gpu-info.c` `inference-engine/llamacpp/native/src/nv-gpu-info/CMakeLists.txt`
Provide helper scripts for environment setup	install-clang.sh installs LLVM 20 toolchain based on Ubuntu variant install-vulkan.sh installs Vulkan SDK dependencies via apt	`inference-engine/llamacpp/native/install-clang.sh` `inference-engine/llamacpp/native/install-vulkan.sh`
Add Dockerfiles for containerized builds and deployment	Generic Vulkan-based multi-stage builder for Linux CPU CUDA-based builder using NVIDIA CUDA base image Root-level Dockerfile for packaging final artifacts RC promotion Dockerfile stub	`inference-engine/llamacpp/native/generic.Dockerfile` `inference-engine/llamacpp/native/cuda.Dockerfile` `inference-engine/llamacpp/Dockerfile` `inference-engine/llamacpp/promote-rc.Dockerfile`
Add documentation and .gitignore	Native README with build/run instructions Top-level README describing inference runtime structure .gitignore in native folder to ignore build artifacts	`inference-engine/llamacpp/native/README.md` `inference-engine/llamacpp/README.md` `inference-engine/llamacpp/native/.gitignore`

Possibly linked issues

Feature Request: Add Vulkan support #158: The PR adds Dockerfiles (generic, CUDA) that enable building llama.cpp for CPU, CUDA, and Vulkan, directly fulfilling the issue's request.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2025-10-13T11:33:55Z

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's ability to support diverse inference engine deployments by introducing Dockerfiles and build configurations for the llama.cpp inference runtime. The changes aim to streamline the process for contributors to integrate and experiment with different hardware acceleration backends, such as CUDA and Vulkan, across various operating systems and architectures, thereby fostering further development and optimization.

Highlights

New Inference Engine Dockerfiles: Introduced a set of Dockerfiles to build and package the llama.cpp inference engine for various environments, facilitating easier deployment and experimentation.
CUDA Support: Added a dedicated cuda.Dockerfile for building llama.cpp with NVIDIA CUDA acceleration, targeting specific CUDA versions and Ubuntu variants to leverage GPU power.
Generic CPU/Vulkan Support: Included a generic.Dockerfile to build llama.cpp for CPU and Vulkan acceleration, supporting both amd64 and arm64 architectures for broader compatibility.
Build System Integration: Provided a Makefile for macOS builds and comprehensive CMakeLists.txt files for configuring and building the llama.cpp native server, including dependency management and Git submodule handling.
NVIDIA GPU Info Utility: Implemented a Windows-specific utility (nv-gpu-info) to retrieve NVIDIA GPU details, integrated into the build process via CMake, aiding in hardware introspection.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR adds Docker infrastructure for building llama.cpp inference engines with support for different acceleration backends (CPU, CUDA, Vulkan). The primary purpose is to enable contributors to implement and enhance support for various hardware acceleration platforms like Intel OneAPI, MUSA, etc.

Adds Docker build configurations for generic CPU and CUDA-accelerated builds
Includes native build system with CMake configuration and platform-specific installation scripts
Provides utilities for NVIDIA GPU information gathering on Windows

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`inference-engine/llamacpp/promote-rc.Dockerfile`	Simple promotion Dockerfile for release candidates
`inference-engine/llamacpp/native/generic.Dockerfile`	Multi-stage build for CPU/Vulkan-accelerated inference engine
`inference-engine/llamacpp/native/cuda.Dockerfile`	CUDA-accelerated build using NVIDIA base images
`inference-engine/llamacpp/native/install-*.sh`	Installation scripts for Vulkan and Clang dependencies
`inference-engine/llamacpp/native/src/nv-gpu-info/`	Windows utility for NVIDIA GPU information gathering
`inference-engine/llamacpp/native/CMakeLists.txt`	CMake configuration for building the inference server
`inference-engine/llamacpp/Makefile`	Cross-platform build system (macOS implementation)
`inference-engine/llamacpp/Dockerfile`	Main Dockerfile for copying artifacts from build stages

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-13T11:34:03Z

inference-engine/llamacpp/native/src/nv-gpu-info/nv-gpu-info.c

+    status = NvAPI_Initialize();
+    if (status != NVAPI_OK) {
+        NvAPI_GetErrorMessage(status, error_str);
+        printf("Failed to initialise NVAPI: %s\n", error_str);


Corrected spelling of 'initialise' to 'initialize'.

Suggested change

printf("Failed to initialise NVAPI: %s\n", error_str);

printf("Failed to initialize NVAPI: %s\n", error_str);

Copilot · 2025-10-13T11:34:03Z

inference-engine/llamacpp/README.md

+
+This repo contains implementations of the llama.cpp inference runtime.
+
+* native/ - contains an implementaion based on `llama.cpp`'s native server


Corrected spelling of 'implementaion' to 'implementation'.

Suggested change

* native/ - contains an implementaion based on `llama.cpp`'s native server

* native/ - contains an implementation based on `llama.cpp`'s native server

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The Makefile only implements the macOS build path while Linux and Windows targets exit immediately—consider adding at least stub Linux support or clearer disable flags to avoid surprise failures.
The manual .git submodule patching in the Dockerfiles is brittle; using a proper git submodule update --init --recursive or cloning with --recurse-submodules would simplify maintenance.
Generic and CUDA Dockerfiles share many identical steps—consolidate common instructions into a base image or use build arguments to reduce duplication and ease future changes.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The Makefile only implements the macOS build path while Linux and Windows targets exit immediately—consider adding at least stub Linux support or clearer disable flags to avoid surprise failures.
- The manual `.git` submodule patching in the Dockerfiles is brittle; using a proper `git submodule update --init --recursive` or cloning with `--recurse-submodules` would simplify maintenance.
- Generic and CUDA Dockerfiles share many identical steps—consolidate common instructions into a base image or use build arguments to reduce duplication and ease future changes.

## Individual Comments

### Comment 1
<location> `inference-engine/llamacpp/native/install-clang.sh:6` </location>
<code_context>
+main() {
+  set -eux -o pipefail
+
+  apt-get update && apt-get install -y cmake ninja-build git wget gnupg2
+  wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Missing noninteractive flags may cause issues in CI environments.

Add DEBIAN_FRONTEND=noninteractive to apt-get install to prevent prompts during automated builds.

```suggestion
  apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y cmake ninja-build git wget gnupg2
```
</issue_to_address>

### Comment 2
<location> `inference-engine/llamacpp/native/install-vulkan.sh:6` </location>
<code_context>
+main() {
+  set -eux -o pipefail
+
+  apt-get install -y glslc libvulkan-dev
+}
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Missing apt-get update before install may cause package issues.

Add 'apt-get update' before installing packages to avoid outdated package lists.

```suggestion
  apt-get update
  apt-get install -y glslc libvulkan-dev
```
</issue_to_address>

### Comment 3
<location> `inference-engine/llamacpp/native/generic.Dockerfile:9` </location>
<code_context>
+
+ARG TARGETARCH
+
+RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl
+
+COPY native/install-vulkan.sh .
</code_context>

<issue_to_address>
**suggestion (performance):** No cleanup of apt cache after installation.

Add 'rm -rf /var/lib/apt/lists/*' after installing packages to minimize image size and prevent outdated package lists.

```suggestion
RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>

### Comment 4
<location> `inference-engine/llamacpp/README.md:5` </location>
<code_context>
+
+This repo contains implementations of the llama.cpp inference runtime.
+
+* native/ - contains an implementaion based on `llama.cpp`'s native server
+  implementation
</code_context>

<issue_to_address>
**issue (typo):** Typo: 'implementaion' should be 'implementation'.

Change 'implementaion' to 'implementation' in the directory description.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-13T11:34:20Z

inference-engine/llamacpp/native/install-clang.sh

+main() {
+  set -eux -o pipefail
+
+  apt-get update && apt-get install -y cmake ninja-build git wget gnupg2


suggestion (bug_risk): Missing noninteractive flags may cause issues in CI environments.

Add DEBIAN_FRONTEND=noninteractive to apt-get install to prevent prompts during automated builds.

Suggested change

apt-get update && apt-get install -y cmake ninja-build git wget gnupg2

apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y cmake ninja-build git wget gnupg2

sourcery-ai · 2025-10-13T11:34:21Z

inference-engine/llamacpp/native/install-vulkan.sh

+main() {
+  set -eux -o pipefail
+
+  apt-get install -y glslc libvulkan-dev


suggestion (bug_risk): Missing apt-get update before install may cause package issues.

Add 'apt-get update' before installing packages to avoid outdated package lists.

Suggested change

apt-get install -y glslc libvulkan-dev

apt-get update

apt-get install -y glslc libvulkan-dev

sourcery-ai · 2025-10-13T11:34:21Z

inference-engine/llamacpp/native/generic.Dockerfile

+
+ARG TARGETARCH
+
+RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl


suggestion (performance): No cleanup of apt cache after installation.

Add 'rm -rf /var/lib/apt/lists/*' after installing packages to minimize image size and prevent outdated package lists.

Suggested change

RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl

RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl && rm -rf /var/lib/apt/lists/*

sourcery-ai · 2025-10-13T11:34:21Z

inference-engine/llamacpp/README.md

+
+This repo contains implementations of the llama.cpp inference runtime.
+
+* native/ - contains an implementaion based on `llama.cpp`'s native server


issue (typo): Typo: 'implementaion' should be 'implementation'.

Change 'implementaion' to 'implementation' in the directory description.

gemini-code-assist

Code Review

This pull request introduces Dockerfiles and build scripts for creating inference engine artifacts for llama.cpp. The changes are well-structured, setting up builds for different environments like CUDA, generic CPU/Vulkan, and local macOS development. My review includes suggestions for optimizing the Dockerfiles by reducing layers, improving script robustness, enhancing build script flexibility for different architectures, and fixing a minor typo in the documentation. I've also pointed out an opportunity to refactor some C code for better maintainability.

gemini-code-assist · 2025-10-13T11:36:14Z

inference-engine/llamacpp/native/install-vulkan.sh

+main() {
+  set -eux -o pipefail
+
+  apt-get install -y glslc libvulkan-dev


It's a best practice to run apt-get update before apt-get install to ensure you are getting the latest package lists. While the calling Dockerfile might do this, making the script self-contained improves robustness and prevents potential build failures due to stale package lists.

Suggested change

apt-get install -y glslc libvulkan-dev

apt-get update && apt-get install -y glslc libvulkan-dev

gemini-code-assist · 2025-10-13T11:36:14Z

inference-engine/llamacpp/native/src/nv-gpu-info/CMakeLists.txt

+
+add_library(nvapi STATIC IMPORTED)
+set_target_properties(nvapi PROPERTIES
+  IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/vendor/nvapi/amd64/nvapi64.lib"


The path to nvapi64.lib is hardcoded for the amd64 architecture. This will cause build failures on other architectures like arm64, which is a supported target for this project on Windows. The path should be constructed dynamically based on the target architecture.

@p1-0tr @ekcasey does this mean there is a bug on arm64 Nvidia platforms at the moment?

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/Dockerfile

+ARG TARGETOS
+ARG TARGETARCH
+ARG ACCEL
+COPY --from=release-artifacts /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH


The destination path for the COPY command is the same as the source directory name. This will result in a deeply nested and cumbersome path for the executables inside the final image (e.g., /com.docker.llama-server.native.linux.cpu.amd64/bin/com.docker.llama-server). It's generally better to copy the contents of the artifact directory into a standard location like / to make the binaries easier to locate and execute. Adding a trailing slash to the source path will copy the directory's contents.

COPY --from=release-artifacts /com.docker.llama-server.native.$TARGETOS.$ACCEL.$TARGETARCH/ /

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/Makefile

+		-DCMAKE_CXX_COMPILER=clang++ \
+		-DCMAKE_C_COMPILER=clang \
+		-DCMAKE_BUILD_TYPE=Release \
+		-DCMAKE_OSX_DEPLOYMENT_TARGET=13.3 \


Hardcoding CMAKE_OSX_DEPLOYMENT_TARGET to 13.3 might be too restrictive for developers on older macOS versions. Consider lowering this value to support a wider range of development environments, or making it a configurable variable, unless this specific version is a strict requirement.

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/README.md

+
+This repo contains implementations of the llama.cpp inference runtime.
+
+* native/ - contains an implementaion based on `llama.cpp`'s native server


There is a typo in "implementaion".

Suggested change

* native/ - contains an implementaion based on `llama.cpp`'s native server

* native/ - contains an implementation based on `llama.cpp`'s native server

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/native/cuda.Dockerfile

+RUN rm install/bin/*.py
+RUN rm -r install/lib/cmake
+RUN rm -r install/lib/pkgconfig
+RUN rm -r install/include


These rm commands can be combined into a single RUN layer to reduce the image size.

RUN rm install/bin/*.py && rm -rf install/lib/cmake install/lib/pkgconfig install/include

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/native/generic.Dockerfile

+RUN echo "-B build \
+          -DCMAKE_BUILD_TYPE=Release \
+          -DGGML_NATIVE=OFF \
+          -DGGML_OPENMP=OFF \
+          -DLLAMA_CURL=OFF \
+          -DGGML_VULKAN=ON \
+          -GNinja \
+    -S ." > cmake-flags
+RUN if [ "${TARGETARCH}" = "amd64" ]; then \
+      echo " -DBUILD_SHARED_LIBS=ON \
+             -DGGML_BACKEND_DL=ON \
+             -DGGML_CPU_ALL_VARIANTS=ON" >> cmake-flags; \
+    elif [ "${TARGETARCH}" = "arm64" ]; then \
+      echo " -DBUILD_SHARED_LIBS=OFF" >> cmake-flags; \
+    else \
+      echo "${TARGETARCH} is not supported"; \
+      exit 1; \
+    fi
+RUN cmake $(cat cmake-flags)
+RUN cmake --build build --config Release -j 4
+RUN cmake --install build --config Release --prefix install


To optimize the Docker image by reducing layers, the logic for setting CMake flags, configuring, building, and installing can be combined into a single RUN instruction. This avoids creating multiple layers and temporary files like cmake-flags.

RUN CMAKE_FLAGS="-B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_OPENMP=OFF -DLLAMA_CURL=OFF -DGGML_VULKAN=ON -GNinja -S ." && \ if [ "${TARGETARCH}" = "amd64" ]; then \ CMAKE_FLAGS="$CMAKE_FLAGS -DBUILD_SHARED_LIBS=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON"; \ elif [ "${TARGETARCH}" = "arm64" ]; then \ CMAKE_FLAGS="$CMAKE_FLAGS -DBUILD_SHARED_LIBS=OFF"; \ else \ echo "${TARGETARCH} is not supported"; \ exit 1; \ fi && \ cmake $CMAKE_FLAGS && \ cmake --build build --config Release -j 4 && \ cmake --install build --config Release --prefix install

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/native/generic.Dockerfile

+RUN rm install/bin/*.py
+RUN rm -r install/lib/cmake
+RUN rm -r install/lib/pkgconfig
+RUN rm -r install/include


These rm commands can be combined into a single RUN layer to reduce the image size.

RUN rm install/bin/*.py && rm -rf install/lib/cmake install/lib/pkgconfig install/include

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/native/install-clang.sh

+  set -eux -o pipefail
+
+  apt-get update && apt-get install -y cmake ninja-build git wget gnupg2
+  wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc


The tee command writes the GPG key to the specified file but also to standard output, which is not necessary here. You can redirect the standard output of tee to /dev/null to keep the build logs cleaner.

Suggested change

wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc

wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc > /dev/null

gemini-code-assist · 2025-10-13T11:36:15Z

inference-engine/llamacpp/native/src/nv-gpu-info/nv-gpu-info.c

+    status = NvAPI_Initialize();
+    if (status != NVAPI_OK) {
+        NvAPI_GetErrorMessage(status, error_str);
+        printf("Failed to initialise NVAPI: %s\n", error_str);
+        return -1;
+    }
+
+    NvU32 driver_version;
+    NvAPI_ShortString build_branch;
+
+    status = NvAPI_SYS_GetDriverAndBranchVersion(&driver_version, build_branch);
+    if (status != NVAPI_OK) {
+        NvAPI_GetErrorMessage(status, error_str);
+        printf("Failed to retrieve driver info: %s\n", error_str);
+        return -1;
+    }


The pattern for checking NvAPI_Status and printing an error message is repeated multiple times throughout the file. To improve code maintainability and reduce duplication, consider extracting this logic into a helper function. For example:

bool check_nvapi_status(NvAPI_Status status, const char* message) { if (status != NVAPI_OK) { NvAPI_ShortString error_str = { 0 }; NvAPI_GetErrorMessage(status, error_str); printf("%s: %s\n", message, error_str); return false; } return true; }

Add Dockerfiles for inference engines

e072e98

So contributors can further enhance, they may want to implement/enable intel's oneapi, musa, etc. Signed-off-by: Eric Curtin <[email protected]>

Copilot AI review requested due to automatic review settings October 13, 2025 11:33

ericcurtin marked this pull request as draft October 13, 2025 11:33

Copilot AI reviewed Oct 13, 2025

View reviewed changes

sourcery-ai bot reviewed Oct 13, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

	printf("Failed to initialise NVAPI: %s\n", error_str);
	printf("Failed to initialize NVAPI: %s\n", error_str);


		This repo contains implementations of the llama.cpp inference runtime.

		* native/ - contains an implementaion based on `llama.cpp`'s native server

	* native/ - contains an implementaion based on `llama.cpp`'s native server
	* native/ - contains an implementation based on `llama.cpp`'s native server

	apt-get update && apt-get install -y cmake ninja-build git wget gnupg2
	apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y cmake ninja-build git wget gnupg2

	apt-get install -y glslc libvulkan-dev
	apt-get update
	apt-get install -y glslc libvulkan-dev


		ARG TARGETARCH

		RUN apt-get update && apt-get install -y cmake ninja-build git build-essential curl

	apt-get install -y glslc libvulkan-dev
	apt-get update && apt-get install -y glslc libvulkan-dev

	wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \| tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc
	wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \| tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc > /dev/null

Add Dockerfiles for inference engines #217

Are you sure you want to change the base?

Add Dockerfiles for inference engines #217

Uh oh!

Conversation

ericcurtin commented Oct 13, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for nv-gpu-info utility and NVAPI usage

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Oct 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

ericcurtin Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

ericcurtin commented Oct 13, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 13, 2025 •

edited

Loading