Run LLM in local for development

Run quickly a LLM in local as backend for development along with a Chat UI.

Using Ollama and LiteLLM.

All installed via docker compose.

Requirements

docker compose (recommended V2).
nvidia-container-toolkit installed if you have gpu.

Install

Configure .env.

COMPOSE_PROFILES. gpu (you need nvidia-container-toolkit installed) or cpu.

Run docker compose.

docker compose up -d

Access to the services

UI: http://localhost:3000
OpenAI API: http://localhost:11434

Other interesting commands

Common docker compose commands useful in daily execution:

Download a ollama model from cli:

docker compose exec ollama-gpu ollama pull <model_name>

Stop.

docker compose stop

Show logs.

docker compose logs -f

Remove all.

docker compose down -v

Use your local LLM as Open AI replacement

Example using Langchain:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_base="http://localhost:11434/v1", openai_api_key="ignored", model=<model>)

print(llm.invoke("Who are you?"))

Run it with uv:

export MODEL=qwen2.5:0.5b
docker compose exec ollama-gpu ollama pull $MODEL
uv run --with langchain[openai] test/simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Run LLM in local for development

Requirements

Install

Access to the services

Other interesting commands

Use your local LLM as Open AI replacement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Run LLM in local for development

Requirements

Install

Access to the services

Other interesting commands

Use your local LLM as Open AI replacement