LLM Platform

Open Source LLM Platform to build and deploy applications at scale

Architecture

Integrations & Configuration

LLM Providers

OpenAI Platform

https://platform.openai.com/docs/api-reference

providers:
  - type: openai
    token: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    models:
      - gpt-4o
      - gpt-4o-mini
      - text-embedding-3-small
      - text-embedding-3-large
      - whisper-1
      - dall-e-3
      - tts-1
      - tts-1-hd

Azure OpenAI Service

https://azure.microsoft.com/en-us/products/ai-services/openai-service

providers:
  - type: openai
    url: https://xxxxxxxx.openai.azure.com
    token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    models:
      # https://docs.anthropic.com/en/docs/models-overview
      #
      # {alias}:
      #   - id: {azure oai deployment name}

      gpt-3.5-turbo:
        id: gpt-35-turbo-16k

      gpt-4:
        id: gpt-4-32k
        
      text-embedding-ada-002:
        id: text-embedding-ada-002

Anthropic

https://www.anthropic.com/api

providers:
  - type: anthropic
    token: sk-ant-apixx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    models:
      # https://docs.anthropic.com/en/docs/models-overview
      #
      # {alias}:
      #   - id: {anthropic api model name}

      claude-3-opus:
        id: claude-3-opus-20240229

Cohere

providers:
  - type: cohere
    token: ${COHERE_API_KEY}

    # https://docs.cohere.com/docs/models
    models:
      cohere-command-r-plus:
        id: command-r-plus
      
      cohere-embed-multilingual-v3:
        id: embed-multilingual-v3.0

Groq

providers:
  - type: groq
    token: ${GROQ_API_KEY}

    # https://console.groq.com/docs/models
    models:
      groq-llama-3-8b:
        id: llama3-8b-8192

      groq-whisper-1:
        id: whisper-large-v3

Mistral AI

providers:
  - type: mistral
    token: ${MISTRAL_API_KEY}

    # https://docs.mistral.ai/getting-started/models/
    models:
      mistral-large:
        id: mistral-large-latest

Replicate

https://replicate.com/

providers:
  - type: replicate
    token: ${REPLICATE_API_KEY}
    models:
      replicate-flux-pro:
        id: black-forest-labs/flux-pro

Ollama

https://ollama.ai

$ ollama start
$ ollama run mistral

providers:
  - type: ollama
    url: http://localhost:11434

    models:
      # https://ollama.com/library
      #
      # {alias}:
      #   - id: {ollama model name with optional version}

      mistral-7b-instruct:
        id: mistral:latest

LLAMA.CPP

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

# using taskfile.dev
$ task llama:server

# LLAMA.CPP Server
$ llama-server --port 9081 --log-disable --model ./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

# LLAMA.CPP Server (Multimodal Model)
$ llama-server --port 9081 --log-disable --model ./models/llava-v1.5-7b-Q4_K.gguf --mmproj ./models/llava-v1.5-7b-mmproj-Q4_0.gguf

# using Docker (might be slow)
$ docker run -it --rm -p 9081:9081 -v ./models/:/models/ ghcr.io/ggerganov/llama.cpp:server --host 0.0.0.0 --port 9081 --model /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

providers:
  - type: llama
    url: http://localhost:9081

    models:
      - mistral-7b-instruct

Mistral.RS

https://github.com/EricLBuehler/mistral.rs

$ mistralrs-server --port 1234 --isq Q4K plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama

providers:
  - type: mistralrs
    url: http://localhost:1234

    models:
      mistralrs-llama-3.1-8b:
        id: llama

WHISPER.CPP

https://github.com/ggerganov/whisper.cpp/tree/master/examples/server

# using taskfile.dev
$ task whisper:server

# WHISPER.CPP Server
$ whisper-server --port 9083 --convert --model ./models/whisper-large-v3-turbo.bin

providers:
  - type: whisper
    url: http://localhost:9083

    models:
      - whisper

Hugging Face

https://huggingface.co/

providers:
  - type: huggingface
    token: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    models:
      mistral-7B-instruct:
        id: mistralai/Mistral-7B-Instruct-v0.1
      
      huggingface-minilm-l6-2:
        id: sentence-transformers/all-MiniLM-L6-v2

Eleven Labs

providers:
  - type: elevenlabs
    token: ${ELEVENLABS_API_KEY}

    models:
      elevenlabs-sarah:
        id: EXAVITQu4vr4xnSDxMaL
      
      elevenlabs-charlie:
        id: IKne3meq5aSn9XLyUdCD

LangChain / LangServe

https://python.langchain.com/docs/langserve

providers:
  - type: langchain
    url: http://your-langchain-server:8000

    models:
      - langchain

Routers

Round-robin Load Balancer

routers:
  llama-lb:
    type: roundrobin
    models:
      - llama-3-8b
      - groq-llama-3-8b
      - huggingface-llama-3-8b

Vector Databses / Indexes

Chroma

https://www.trychroma.com

# using Docker
$ docker run -it --rm -p 9083:8000 -v chroma-data:/chroma/chroma ghcr.io/chroma-core/chroma

indexes:
  docs:
    type: chroma
    url: http://localhost:9083
    namespace: docs
    embedder: text-embedding-ada-002

Weaviate

https://weaviate.io

# using Docker
$ docker run -it --rm -p 9084:8080 -v weaviate-data:/var/lib/weaviate -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true -e PERSISTENCE_DATA_PATH=/var/lib/weaviate semitechnologies/weaviate

indexes:
  docs:
    type: weaviate
    url: http://localhost:9084
    namespace: Document
    embedder: text-embedding-ada-002

Qdrant

$ docker run -p 6333:6333 qdrant/qdrant:v1.11.4

indexes:
  docs:
    type: qdrant
    url: http://localhost:6333
    namespace: docs
    embedder: text-embedding-ada-002

In-Memory

indexes:
  docs:
    type: memory   
    embedder: text-embedding-ada-002

OpenSearch / Elasticsearch

# using Docker
docker run -it --rm -p 9200:9200 -v opensearch-data:/usr/share/opensearch/data -e "discovery.type=single-node" -e DISABLE_SECURITY_PLUGIN=true opensearchproject/opensearch:latest

indexes:
  docs:
    type: elasticsearch
    url: http://localhost:9200
    namespace: docs

Extractor

Tika

# using Docker
docker run -it --rm -p 9998:9998 apache/tika:3.0.0.0-BETA2-full

extractors:  
  tika:
    type: tika
    url: http://localhost:9998
    chunkSize: 4000
    chunkOverlap: 200

Unstructured

https://unstructured.io

# using Docker
docker run -it --rm -p 9085:8000 quay.io/unstructured-io/unstructured-api:0.0.80 --port 8000 --host 0.0.0.0

extractors:
  unstructured:
    type: unstructured
    url: http://localhost:9085

Name		Name	Last commit message	Last commit date
Latest commit History 397 Commits
cmd		cmd
config		config
docs		docs
examples		examples
pkg		pkg
server		server
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.llama.yml		Taskfile.llama.yml
Taskfile.nomic.yml		Taskfile.nomic.yml
Taskfile.qdrant.yml		Taskfile.qdrant.yml
Taskfile.reranker.yml		Taskfile.reranker.yml
Taskfile.tts.yml		Taskfile.tts.yml
Taskfile.unstructured.yml		Taskfile.unstructured.yml
Taskfile.whisper.yml		Taskfile.whisper.yml
Taskfile.yml		Taskfile.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Platform

Architecture

Integrations & Configuration

LLM Providers

OpenAI Platform

Azure OpenAI Service

Anthropic

Cohere

Groq

Mistral AI

Replicate

Ollama

LLAMA.CPP

Mistral.RS

WHISPER.CPP

Hugging Face

Eleven Labs

LangChain / LangServe

Routers

Round-robin Load Balancer

Vector Databses / Indexes

Chroma

Weaviate

Qdrant

In-Memory

OpenSearch / Elasticsearch

Extractor

Tika

Unstructured

About

Releases

Packages

Contributors 2

Languages

License

adrianliechti/llama

Folders and files

Latest commit

History

Repository files navigation

LLM Platform

Architecture

Integrations & Configuration

LLM Providers

OpenAI Platform

Azure OpenAI Service

Anthropic

Cohere

Groq

Mistral AI

Replicate

Ollama

LLAMA.CPP

Mistral.RS

WHISPER.CPP

Hugging Face

Eleven Labs

LangChain / LangServe

Routers

Round-robin Load Balancer

Vector Databses / Indexes

Chroma

Weaviate

Qdrant

In-Memory

OpenSearch / Elasticsearch

Extractor

Tika

Unstructured

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages