diff --git a/README.md b/README.md index be4eaf5..cf150b3 100644 --- a/README.md +++ b/README.md @@ -57,36 +57,42 @@ ### Starting app production -#### Embedding api - -Download models (need git-lfs): +#### Starting app ```sh -cd models -git clone git@hf.co:intfloat/e5-large-v2 +docker compose --profile prod up ``` -Upon app startup, OpenAI-compatible embedding API will be available at: - +### Starting llm and embedding -Check the docs here: +1. Download model (must have for service llm-embedding to work!!!) -#### Download llm model (must have for servis llm to work !!!) + Download model (size of file 3.6GB ): -Download model (size of file 3.6GB ): + ```sh + curl -o ./llm/models/llama-2-7b.Q3_K_L.gguf -L https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q3_K_L.gguf + ``` -```sh -curl -o ./llm/models/llama-2-7b.Q3_K_L.gguf -L https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q3_K_L.gguf -``` + or -or + ```sh + wget -P ./llm/models/llama-2-7b.Q3_K_L.gguf https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q3_K_L.gguf + ``` -```sh -wget -P ./llm/models/llama-2-7b.Q3_K_L.gguf https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q3_K_L.gguf -``` +2. Launching llm and embedding -#### Starting app + 2.1. Running on cpu -```sh -docker compose --profile prod up -``` + ```sh + docker compose --profile cpu up + ``` + + 2.2. Running on gpu + + ```sh + docker compose --profile gpu up + ``` + +#### LLM and embedding api swagger + +Swegger with EP for completions(llm + embedding) and only embedding is [here](http://0.0.0.0:9000/docs) diff --git a/docker-compose.yml b/docker-compose.yml index 5a05b3e..f68310f 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -5,7 +5,7 @@ services: env_file: .env db: - profiles: ["dev","prod"] + profiles: [ "dev", "prod" ] build: context: ./postgres dockerfile: postgres.Dockerfile @@ -28,11 +28,27 @@ services: depends_on: - db - llm: - profiles: [ "dev", "prod" ] + llm-embedding-cpu: + profiles: [ "cpu" ] + build: + context: ./llm/ + volumes: + - ./llm/models:/models + ports: + - "9000:9000" + + llm-embedding-gpu: + profiles: [ "gpu" ] build: context: ./llm/ volumes: - ./llm/models:/models ports: - "9000:9000" + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [ gpu ]