Ollama

This example uses Ollama to deploy a model: https://ollama.ai/

Get model weights from Triton

In Triton:

module use /scratch/shareddata/modules/core-models

# Choose the model that we want to use
module load model-llama2/13b-chat

# Choose the llama.cpp model quantization we want to use
module load model-llama.cpp/q4_1-2023-08-28

# Get the path to model weights
echo $MODEL_WEIGHTS
# Example output: /scratch/shareddata/LLMs_tools/models/llama2-llama.cpp-2023-08-28/llama-2-13b-chat/ggml-model-q4_1.gguf

On whatever machine you're working on:

# Copy weights from Triton to your machine:
scp triton.aalto.fi:/scratch/shareddata/LLMs_tools/models/llama2-llama.cpp-2023-08-28/llama-2-13b-chat/ggml-model-q4_1.gguf llama-13b-chat-q4_1.gguf

# Get ollama
curl -L https://ollama.ai/download/ollama-linux-amd64 -o ollama
chmod +x ollama

Create a file called Modelfile with the following contents:

FROM ./llama-13b-chat-q4_1.gguf

In one terminal, start up ollama:

./ollama serve

In another terminal, give the model to the ollama server:

./ollama create llama-13b-chat-q4_1 -f Modelfile

Run the model:

./ollama run llama-13b-chat-q4_1

For more information on Ollama installation, see this page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ollama

Get model weights from Triton

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ollama

Get model weights from Triton