Skip to content

Conversation

@FarrelRamdhani
Copy link
Contributor

This pull request introduces Docker support for Kolosal Server with CUDA GPU acceleration, adds a minimal config for embedding-only deployments, and updates the default configuration to optimize for Docker and GPU usage. The changes make it easier to deploy the server on GPU-enabled hosts using Docker, streamline configuration for common use cases, and improve documentation for containerized environments.

Introduces a multi-stage Dockerfile for building and running kolosal-server with CUDA support. Updates README with instructions for building and running the server in a Docker container, including health checks and config mounting.
Changed server port to 8080 for Docker compatibility and updated SearXNG URL to a public instance. Added CUDA-based inference engine as default, adjusted library paths for Docker/Linux, and set model inference engines to use CUDA. Updated comments and parameters for better clarity and GPU layer offloading.
Corrected the indentation of the 'library_path' field under the 'llama-cpu' inference engine to ensure proper YAML parsing and configuration loading.
Added CMAKE_VERSION argument and logic to ensure CMake is upgraded to version 3.27.9 or higher, as required by PoDoFo. This improves build reliability by guaranteeing a compatible CMake version.
Adds steps to the Dockerfile to automatically clone zlib and pugixml repositories if their sources are not found in the build context. This ensures required external dependencies are available for the build process.
Eliminates the steps that clone zlib and pugixml if not present, assuming these dependencies are now handled elsewhere or are always available in the build context.
Sets the CUDA_CUDA_LIBRARY variable in the CMake configuration to explicitly specify the location of libcuda.so. This helps ensure correct linking when building with CUDA support.
Adds the -DGGML_CUDA_NO_VMM=ON flag to the CMake configuration in the Dockerfile to disable CUDA VMM support during build.
Added libblas3, liblapack3, and libgfortran5 to the minimal runtime dependencies in the Dockerfile to support applications requiring these scientific libraries.
Changed 'allow_public_access' to true in config_rms.yaml to permit access from other devices on the same network.
Introduces configs/config_basic.yaml for minimal embedding-only deployments. Updates Dockerfile to prefer config_basic.yaml as the default configuration, with fallback logic for other config files and adjusts entrypoint to use config_basic.yaml by default.
Expanded the README with detailed steps for using prebuilt Docker images from GitHub Container Registry, including prerequisites, image pulling, running with GPU support, mounting directories, configuration options, updating/rollback, and troubleshooting. This helps users deploy the server more easily using Docker.
Set 'rate_limit.enabled' to false in config_basic.yaml to turn off API rate limiting. This may be for testing or to allow unrestricted access during development.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants