Skip to content

2.3.40 Satellite llamaswap

av edited this page Mar 22, 2025 · 2 revisions

Handle: llamaswap
URL: http://localhost:34401

llama-swap is a lightweight, transparent proxy server that provides automatic model swapping to llama.cpp's server.

Starting

# [Optional] pre-pull the image
harbor pull llamaswap

# Run the service
harbor up llamaswap
  • llamaswap image in Harbor will run its own llama.cpp server, that is different from the one running in the llamacpp service
  • Harbor will connect llamaswap to Open WebUI when run together
  • Harbor will mount following local caches to be available within llama-swap container:
    • Ollama - /root/.ollama
    • Hugging Face - /root/.cache/huggingface
    • llama.cpp - /root/.cache/llama.cpp
    • vLLM - /root/.cache/vllm

Configuration

Expected way to configure llama-swap is by editing the config.yaml file:

# Open in your default editor
open $(harbor home)/llamaswap/config.yaml

See official configuration example for reference.

Clone this wiki locally