-
Notifications
You must be signed in to change notification settings - Fork 4
Add Docker support #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
FarrelRamdhani
wants to merge
15
commits into
main
Choose a base branch
from
docker
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduces a multi-stage Dockerfile for building and running kolosal-server with CUDA support. Updates README with instructions for building and running the server in a Docker container, including health checks and config mounting.
Changed server port to 8080 for Docker compatibility and updated SearXNG URL to a public instance. Added CUDA-based inference engine as default, adjusted library paths for Docker/Linux, and set model inference engines to use CUDA. Updated comments and parameters for better clarity and GPU layer offloading.
Corrected the indentation of the 'library_path' field under the 'llama-cpu' inference engine to ensure proper YAML parsing and configuration loading.
Added CMAKE_VERSION argument and logic to ensure CMake is upgraded to version 3.27.9 or higher, as required by PoDoFo. This improves build reliability by guaranteeing a compatible CMake version.
Adds steps to the Dockerfile to automatically clone zlib and pugixml repositories if their sources are not found in the build context. This ensures required external dependencies are available for the build process.
Eliminates the steps that clone zlib and pugixml if not present, assuming these dependencies are now handled elsewhere or are always available in the build context.
Sets the CUDA_CUDA_LIBRARY variable in the CMake configuration to explicitly specify the location of libcuda.so. This helps ensure correct linking when building with CUDA support.
Adds the -DGGML_CUDA_NO_VMM=ON flag to the CMake configuration in the Dockerfile to disable CUDA VMM support during build.
Added libblas3, liblapack3, and libgfortran5 to the minimal runtime dependencies in the Dockerfile to support applications requiring these scientific libraries.
Changed 'allow_public_access' to true in config_rms.yaml to permit access from other devices on the same network.
Introduces configs/config_basic.yaml for minimal embedding-only deployments. Updates Dockerfile to prefer config_basic.yaml as the default configuration, with fallback logic for other config files and adjusts entrypoint to use config_basic.yaml by default.
Expanded the README with detailed steps for using prebuilt Docker images from GitHub Container Registry, including prerequisites, image pulling, running with GPU support, mounting directories, configuration options, updating/rollback, and troubleshooting. This helps users deploy the server more easily using Docker.
Set 'rate_limit.enabled' to false in config_basic.yaml to turn off API rate limiting. This may be for testing or to allow unrestricted access during development.
This reverts commit 8c27814.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces Docker support for Kolosal Server with CUDA GPU acceleration, adds a minimal config for embedding-only deployments, and updates the default configuration to optimize for Docker and GPU usage. The changes make it easier to deploy the server on GPU-enabled hosts using Docker, streamline configuration for common use cases, and improve documentation for containerized environments.