Add Docker support #27

FarrelRamdhani · 2025-09-07T10:44:51Z

This pull request introduces Docker support for Kolosal Server with CUDA GPU acceleration, adds a minimal config for embedding-only deployments, and updates the default configuration to optimize for Docker and GPU usage. The changes make it easier to deploy the server on GPU-enabled hosts using Docker, streamline configuration for common use cases, and improve documentation for containerized environments.

Introduces a multi-stage Dockerfile for building and running kolosal-server with CUDA support. Updates README with instructions for building and running the server in a Docker container, including health checks and config mounting.

Changed server port to 8080 for Docker compatibility and updated SearXNG URL to a public instance. Added CUDA-based inference engine as default, adjusted library paths for Docker/Linux, and set model inference engines to use CUDA. Updated comments and parameters for better clarity and GPU layer offloading.

Corrected the indentation of the 'library_path' field under the 'llama-cpu' inference engine to ensure proper YAML parsing and configuration loading.

Added CMAKE_VERSION argument and logic to ensure CMake is upgraded to version 3.27.9 or higher, as required by PoDoFo. This improves build reliability by guaranteeing a compatible CMake version.

Adds steps to the Dockerfile to automatically clone zlib and pugixml repositories if their sources are not found in the build context. This ensures required external dependencies are available for the build process.

Eliminates the steps that clone zlib and pugixml if not present, assuming these dependencies are now handled elsewhere or are always available in the build context.

Sets the CUDA_CUDA_LIBRARY variable in the CMake configuration to explicitly specify the location of libcuda.so. This helps ensure correct linking when building with CUDA support.

Adds the -DGGML_CUDA_NO_VMM=ON flag to the CMake configuration in the Dockerfile to disable CUDA VMM support during build.

Added libblas3, liblapack3, and libgfortran5 to the minimal runtime dependencies in the Dockerfile to support applications requiring these scientific libraries.

Changed 'allow_public_access' to true in config_rms.yaml to permit access from other devices on the same network.

Introduces configs/config_basic.yaml for minimal embedding-only deployments. Updates Dockerfile to prefer config_basic.yaml as the default configuration, with fallback logic for other config files and adjusts entrypoint to use config_basic.yaml by default.

Expanded the README with detailed steps for using prebuilt Docker images from GitHub Container Registry, including prerequisites, image pulling, running with GPU support, mounting directories, configuration options, updating/rollback, and troubleshooting. This helps users deploy the server more easily using Docker.

Set 'rate_limit.enabled' to false in config_basic.yaml to turn off API rate limiting. This may be for testing or to allow unrestricted access during development.

This reverts commit 8c27814.

FarrelRamdhani added 15 commits September 3, 2025 16:24

Add Dockerfile for CUDA GPU deployment

f6a35cb

Introduces a multi-stage Dockerfile for building and running kolosal-server with CUDA support. Updates README with instructions for building and running the server in a Docker container, including health checks and config mounting.

Fix indentation for llama-cpu library_path in YAML

0f11149

Corrected the indentation of the 'library_path' field under the 'llama-cpu' inference engine to ensure proper YAML parsing and configuration loading.

Update Dockerfile

2558bc7

Pin and upgrade CMake version in Dockerfile

cc488b4

Added CMAKE_VERSION argument and logic to ensure CMake is upgraded to version 3.27.9 or higher, as required by PoDoFo. This improves build reliability by guaranteeing a compatible CMake version.

Clone zlib and pugixml if not present in Docker build

11c0731

Adds steps to the Dockerfile to automatically clone zlib and pugixml repositories if their sources are not found in the build context. This ensures required external dependencies are available for the build process.

Remove zlib and pugixml cloning from Dockerfile

473972e

Eliminates the steps that clone zlib and pugixml if not present, assuming these dependencies are now handled elsewhere or are always available in the build context.

Add CUDA library path to CMake build

68fc324

Sets the CUDA_CUDA_LIBRARY variable in the CMake configuration to explicitly specify the location of libcuda.so. This helps ensure correct linking when building with CUDA support.

Enable GGML_CUDA_NO_VMM in Docker build

0269d97

Adds the -DGGML_CUDA_NO_VMM=ON flag to the CMake configuration in the Dockerfile to disable CUDA VMM support during build.

Add BLAS, LAPACK, and gfortran to Docker runtime deps

4bf9f83

Added libblas3, liblapack3, and libgfortran5 to the minimal runtime dependencies in the Dockerfile to support applications requiring these scientific libraries.

Enable public access in server config

63ff476

Changed 'allow_public_access' to true in config_rms.yaml to permit access from other devices on the same network.

Disable rate limiting in basic config

8c27814

Set 'rate_limit.enabled' to false in config_basic.yaml to turn off API rate limiting. This may be for testing or to allow unrestricted access during development.

Revert "Disable rate limiting in basic config"

40d6709

This reverts commit 8c27814.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Docker support #27

Add Docker support #27

Uh oh!

FarrelRamdhani commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Docker support #27

Are you sure you want to change the base?

Add Docker support #27

Uh oh!

Conversation

FarrelRamdhani commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants