LLM Cache Proxy

LLM Cache Proxy is a FastAPI-based application that serves as a caching layer for OpenAI's API. It intercepts requests to the OpenAI API, caches responses, and serves cached responses for identical requests, potentially reducing API costs and improving response times.

TL;DR: Quick Start with Docker

# Run the container
docker run -p 9999:9999 \
  -v $(pwd)/data:/app/data \
  llm-cache-proxy

OR

# Run the container
docker run -p 9999:9999 \
  -e OPENAI_API_KEY=your_api_key_here \
  -e OPENAI_BASE_URL=https://api.openai.com/v1 \
  -v $(pwd)/data:/app/data \
  llm-cache-proxy

The proxy is now available at http://localhost:9999

Use /cache/chat/completions for cached requests

Use /chat/completions for uncached requests

Features

Caches responses from OpenAI's API
Supports both streaming and non-streaming responses
Compatible with OpenAI's chat completion endpoint
Configurable via environment variables
Dockerized for easy deployment
Persistent cache storage

Prerequisites

Python 3.12+
Docker (optional, for containerized deployment)

Installation

Clone the repository:

git clone https://github.com/yourusername/llm-cache-proxy.git
cd llm-cache-proxy

Install the required packages:
```
pip install -r requirements.txt
```

Configuration

Set the following environment variables:

OPENAI_API_KEY: Your OpenAI API key
OPENAI_BASE_URL: The base URL for OpenAI's API (default: https://api.openai.com/v1)

You can set these in a .env file in the project root.

Usage

Running Locally

Start the server:
```
python -m app.main
```
The server will be available at http://localhost:9999

Using Docker

Build the Docker image:
```
docker build -t llm-cache-proxy .
```
Run the container with persistent storage:
```
docker run -p 9999:9999 \
  -e OPENAI_API_KEY=your_api_key_here \
  -e OPENAI_BASE_URL=https://api.openai.com/v1 \
  -v $(pwd)/data:/app/data \
  llm-cache-proxy
```
This command mounts a data directory from your current working directory to the /app/data directory in the container, ensuring that the cache persists between container restarts.

API Endpoints

/chat/completions: Proxies requests to OpenAI's chat completion API without caching
/cache/chat/completions: Proxies requests to OpenAI's chat completion API with caching

Both endpoints accept the same parameters as OpenAI's chat completion API.

Development

To run the application in verbose mode, use the --verbose flag:

python -m app.main --verbose

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLM Cache Proxy

TL;DR: Quick Start with Docker

Features

Prerequisites

Installation

Configuration

Usage

Running Locally

Using Docker

API Endpoints

Development

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLM Cache Proxy

TL;DR: Quick Start with Docker

Features

Prerequisites

Installation

Configuration

Usage

Running Locally

Using Docker

API Endpoints

Development

Contributing

License