-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Map resource profiles to images & add resource profile doc (#154)
Define imageNames for given model servers, reference imageNames in resourceProfiles. Switch reference resource profiles to be lowercase to support the possibility of introducing a ResourceProfile CRD in the future (would require a lowercase .metadata.name). Fixes #152 via a different technique.
- Loading branch information
Showing
14 changed files
with
187 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Resource Profiles | ||
|
||
A resource profile maps a type of compute resource (i.e. NVIDIA L4 GPU) to a collection of Kubernetes settings that are set on inference server Pods. These profiles are defined in the KubeAI `config.yaml` file (via a ConfigMap). Each model specifies the resource profile that it requires. | ||
|
||
Kubernetes Model resources specify the resource profile and the count of that resource that they require: | ||
|
||
```yaml | ||
# model.yaml | ||
apiVersion: kubeai.org/v1 | ||
kind: Model | ||
metadata: | ||
name: llama-3.1-8b-instruct-fp8-l4 | ||
spec: | ||
engine: VLLM | ||
resourceProfile: nvidia-gpu-l4:1 # Specified at <profile>:<count> | ||
# ... | ||
``` | ||
A given profile might need to contain slightly different settings based on the cluster/cloud that KubeAI is deployed on. | ||
|
||
Example: A resource profile named `NVIDIA_GPU_L4` might contain the following settings on a GKE Kubernetes cluster: | ||
|
||
```yaml | ||
# KubeAI config.yaml | ||
resourceProfiles: | ||
nvidia-gpu-l4: | ||
limits: | ||
# Typical across most Kubernetes clusters: | ||
nvidia.com/gpu: "1" | ||
requests: | ||
nvidia.com/gpu: "1" | ||
nodeSelector: | ||
# Specific to GKE: | ||
cloud.google.com/gke-accelerator: "nvidia-l4" | ||
cloud.google.com/gke-spot: "true" | ||
imageName: "nvidia-gpu" | ||
``` | ||
In addition to node selectors and resource requirements, a resource profile may optionally specify an image name. This name maps to the container image that will be selected when serving a model on that resource: | ||
```yaml | ||
# KubeAI config.yaml | ||
modelServers: | ||
VLLM: | ||
images: | ||
default: "vllm/vllm-openai:v0.5.5" | ||
nvidia-gpu: "vllm/vllm-openai:v0.5.5" # <-- | ||
cpu: "vllm/vllm-openai-cpu:v0.5.5" | ||
OLlama: | ||
images: | ||
# ... | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
models: | ||
catalog: | ||
llama-3.1-8b-instruct-fp8-l4: | ||
enabled: true | ||
|
||
resourceProfiles: | ||
nvidia-gpu-l4: | ||
nodeSelector: | ||
cloud.google.com/gke-accelerator: "nvidia-l4" | ||
cloud.google.com/gke-spot: "true" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.