vLLM Deployment YAMLs for Azure Kubernetes Service

This repository contains the necessary deployment configurations for running vLLM workloads on Kubernetes. The configurations are tailored for single-GPU and multi-GPU setups, enabling seamless deployment in GPU-accelerated environments such as AKS.

Files in the Repository

1. `dep_vllm_1GPU.yaml`

Description:
This YAML file defines the deployment configuration for running vLLM workloads on a single GPU. It is ideal for setups with A100 GPUs (e.g., Standard_NC24ads_A100_v4).
Key Arguments:
- --gpu-memory-utilization=0.95: Allocates 95% of the GPU memory to the model.
- --enforce-eager: Ensures eager initialization of the model.

2. `dep_vllm_Multi_GPU.yaml`

Description:
This YAML file defines the deployment configuration for running vLLM workloads on multiple GPUs. It is designed for setups with T4 GPUs (e.g., Standard_NC64as_T4_v3) or other multi-GPU configurations.
Key Arguments:
- --tensor-parallel-size=4: Enables distributed inference by splitting the model across 4 GPUs.
- --max-model-len=24300: Sets the maximum token length for the model.
- --dtype=float: Specifies the data type for the model.

Prerequisites

1. Kubernetes Secret for Hugging Face Token

Before deploying, create a Kubernetes secret to store your Hugging Face token. This token is required to pull the model during container initialization.

Using `Secret Yaml`:

https://github.com/palash-fin/vLLM_Deploy_AKS/blob/main/secret_hf.yaml

Using `kubectl`:

kubectl create secret generic vllm-model-pull-hf \
  --from-literal=HUGGINGFACE_TOKEN=<your_token> \
  -n genai-deployment

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
dep_vllm_1GPU.yaml		dep_vllm_1GPU.yaml
dep_vllm_Multi_GPU.yaml		dep_vllm_Multi_GPU.yaml
secret_hf.yaml		secret_hf.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Deployment YAMLs for Azure Kubernetes Service

Files in the Repository

1. `dep_vllm_1GPU.yaml`

2. `dep_vllm_Multi_GPU.yaml`

Prerequisites

1. Kubernetes Secret for Hugging Face Token

Using `Secret Yaml`:

Using `kubectl`:

About

Releases

Packages

License

palash-fin/vLLM_Deploy_AKS

Folders and files

Latest commit

History

Repository files navigation

vLLM Deployment YAMLs for Azure Kubernetes Service

Files in the Repository

1. dep_vllm_1GPU.yaml

2. dep_vllm_Multi_GPU.yaml

Prerequisites

1. Kubernetes Secret for Hugging Face Token

Using Secret Yaml:

Using kubectl:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

1. `dep_vllm_1GPU.yaml`

2. `dep_vllm_Multi_GPU.yaml`

Using `Secret Yaml`:

Using `kubectl`:

Packages