Skip to content

RAD-Ninjas/llm-on-prem

Repository files navigation

LLM-On-Prem


Introduction

The frontend is a boilerplate app generated by npx create-secure-chatgpt-app. It leverages several of Pangea's security services for securing ChatGPT usage.

Features

  • Frontend:

    • Authentication using Pangea AuthN service and NextJS framework
    • Secure chat page
    • Secure chat/generate api endpoint
    • Redacting user prompts using Pangea's Redact service
    • Auditing user prompts using Pangea's Secure Audit Log service
    • De-fanging malicious responses from OpenAI API using Pangea's Domain Intel and URL Intel services
  • Backend:

    • lm-sys FastChat server configured to operate as a stand-in for the OpenAI API
    • VLLM inference engine worker
    • Supports multiple HF transformer models

Installation

Once you've cloned the repo, perform the following steps to get the app up and running:

  1. Set-up your local environment file

    cp .env.example .env
    
  2. Update relevant keys and variables with those of your own


  • Note: If running docker desktop on Windows or Mac you will likely need to increase your docker memory allocation to at least 16GB and max out your CPU allocation

  1. Build and Run the containers:

    • CPU:
      • Build docker compose --profile cpu build
      • Run docker compose --profile cpu up

    - **CUDA:** - Build `docker compose --profile cuda build` - Run `docker compose --profile cuda up`
    - **OpenVINO:** - Build `docker compose --profile openvino build` - Run `docker compose --profile openvino up`
    - **Metal:** - Build `docker compose --profile metal build` - Run `docker compose --profile metal up`
    - **GGML:** - Build `docker compose --profile ggml build` - Run `docker compose --profile ggml up`

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published