vLLM Docker Container Image

vLLM is a fast and easy-to-use library for LLM inference and serving. This container image runs the OpenAI API server of vLLM.

The image is only for TPU and CPU inference. For GPU inference, please use the upstream image from vLLM.

Image URLs:

This image only publishes the TPU and CPU images for vLLM:

There are also tags available such as v0.6.3-tpu and v0.6.3-cpu.

Please use the upstream GPU image from vLLM directly:

vllm/vllm-openai:latest

Support the project by adding a star! ❤️

Join us on Discord:

Quickstart

Deploy Mistral 7B Instruct using Docker:

docker run -d -p 8080:8080 --gpus=all \
  substratusai/vllm \
  --model=mistralai/Mistral-7B-Instruct-v0.1

docker build -t ghcr.io/substratusai/vllm .

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
chat-templates		chat-templates
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.tpu		Dockerfile.tpu
LICENSE		LICENSE
README.md		README.md