Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce ollama cpp optimized for Xeon CPU #848

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions comps/llms/text-generation/ollama/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Introduction

[Ollama](https://github.com/ollama/ollama) allows you to run open-source large language models, such as Llama 3, locally. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. It's the best choice to deploy large language models on AIPC locally.
13 changes: 13 additions & 0 deletions comps/llms/text-generation/ollama/cpp/xeon/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

FROM ubuntu:latest
RUN apt update && apt -y install wget git cmake build-essential
RUN wget https://go.dev/dl/go1.23.2.linux-amd64.tar.gz && rm -rf /usr/local/go && tar -C /usr/local -xzf go1.23.2.linux-amd64.tar.gz
ENV PATH="$PATH:/usr/local/go/bin"
WORKDIR /ollama
RUN git clone https://github.com/ollama/ollama.git .
RUN cd /ollama && OLLAMA_CUSTOM_CPU_DEFS="-DGGML_AVX=on -DGGML_AVX2=on -DGGML_F16C=on -DGGML_FMA=on -DGGML_AVX512=on -DGGML_AVX512_VNNI=on -DGGML_AVX512_VBMI=on " go generate ./... && go build .
ENV PATH="$PATH:/ollama"
COPY entrypoint.sh .
ENTRYPOINT ["bash", "entrypoint.sh"]
35 changes: 35 additions & 0 deletions comps/llms/text-generation/ollama/cpp/xeon/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Introduction

This Ollama server was compiled from the [official Ollama repository](https://github.com/ollama/ollama) with additional flags suitable for Intel Xeon CPU. Below are the compilation flags:

- `DGGML_AVX=on`
- `DGGML_AVX2=on`
- `DGGML_F16C=on`
- `DGGML_FMA=on`
- `DGGML_AVX512=on`
- `DGGML_AVX512_VNNI=on`
- `DGGML_AVX512_VBMI=on`

## Usage

1. Start the microservice

```bash
docker run --network host opea/llm-ollama-cpp-xeon:latest
```

2. Send an application/json request to the API endpoint of Ollama to interact.

```bash
curl --noproxy "*" http://localhost:11434/api/generate -d '{
"model": "phi3",
"prompt":"Why is the sky blue?"
}'
```

## Build Docker Image

```bash
cd comps/llms/text-generation/ollama/cpp/xeon
docker build -t opea/llm-ollama-cpp-xeon:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
6 changes: 6 additions & 0 deletions comps/llms/text-generation/ollama/cpp/xeon/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

ollama serve
Loading