Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile for llm and service in docker compose #98

Merged
merged 15 commits into from
Feb 25, 2024
Merged
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,12 @@ Upon app startup, OpenAI-compatible embedding API will be available at:

Check the docs here: <http://172.16.3.101:5001/docs>

#### llamacpp
#### Download llm model (must have for servis llm to work !!!)

Download models (this can take >1h):
Download model (this can take ~6min):
pgronkievitz marked this conversation as resolved.
Show resolved Hide resolved

```sh
wget https://huggingface.co/TheBloke/sheep-duck-llama-2-70B-v1.1-GGUF/resolve/main/sheep-duck-llama-2-70b-v1.1.Q4_K_S.gguf
wget -P ./llm/models https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q3_K_L.gguf
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I'd add curl instructions as well (pro tip is -o flag) - there are systems with no curl and without wget, so this way you won't force anyone to modify the command or to install new package


#### Starting app
Expand Down
13 changes: 11 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
discord-bot:
profiles: ["dev","prod"]
profiles: [ "dev", "prod" ]
build: ./discord_bot
env_file: .env

Expand All @@ -16,7 +16,7 @@ services:
- .env

api:
profiles: ["dev","prod"]
profiles: [ "dev", "prod" ]
build:
context: ./api/
env_file:
Expand All @@ -27,3 +27,12 @@ services:
- "8000:8000"
depends_on:
- db

llm:
profiles: [ "dev", "prod" ]
build:
context: ./llm/
volumes:
- ./llm/models:/models
ports:
- "9000:9000"
5 changes: 5 additions & 0 deletions llm/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
models/
*.ipynb
llama-cpp-python/
.pytest_cache/
__pycache__/
pgronkievitz marked this conversation as resolved.
Show resolved Hide resolved
28 changes: 28 additions & 0 deletions llm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
FROM python:3.11-buster as builder

RUN pip install poetry==1.6.1
RUN apt-get update && apt-get install -y git
RUN git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git

ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/poetry_cachee

WORKDIR /app

COPY pyproject.toml poetry.lock ./
RUN touch README.md

RUN --mount=type=cache,target=$POETRY_CACHE_DIR poetry install

FROM python:3.11-slim-buster as runtime

ENV VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH"

COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}

RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server]

ENTRYPOINT ["python3", "-m", "llama_cpp.server", "--host", "0.0.0.0", "--port", "9000", "--model", "models/llama-2-7b.Q3_K_L.gguf", "--n_gpu_layers", "9999999"]
Loading