2.3.40 Satellite llamaswap

Jump to bottom

av edited this page Mar 22, 2025 · 2 revisions

llama-swap

Handle: llamaswap
URL: http://localhost:34401

llama-swap is a lightweight, transparent proxy server that provides automatic model swapping to llama.cpp's server.

Starting

# [Optional] pre-pull the image
harbor pull llamaswap

# Run the service
harbor up llamaswap

llamaswap image in Harbor will run its own llama.cpp server, that is different from the one running in the llamacpp service
Harbor will connect llamaswap to Open WebUI when run together
Harbor will mount following local caches to be available within llama-swap container:
- Ollama - /root/.ollama
- Hugging Face - /root/.cache/huggingface
- llama.cpp - /root/.cache/llama.cpp
- vLLM - /root/.cache/vllm

Configuration

Expected way to configure llama-swap is by editing the config.yaml file:

# Open in your default editor
open $(harbor home)/llamaswap/config.yaml

See official configuration example for reference.

Home | CLI Reference | Services | Adding New Service | Compatibility