virtual-gpu-device-plugin

This is a fork of https://github.com/awslabs/aws-virtual-gpu-device-plugin.

AWS's original plugin does not support memory allocation via plugin, but by defining language specific arguments.

This fork is in active development, with following goals/challanges:

Support memory allocation via plugin
Support GPU allocation by model name
Produce telemetry

End goal is something like:

# On a server with 1 T4 and 2 v100 GPUs with 10 vGPU per device
    resources:
      limits:
        k8s.kuartis.com/nvidia-t4: 10
        k8s.kuartis.com/nvidia-t4: 16384
        k8s.kuartis.com/nvidia-v100: 20
        k8s.kuartis.com/nvidia-v100: 32768

Install and Test

# Install daemonset + service + service monitor (prometheus)
kubectl create -f https://raw.githubusercontent.com/youscan/virtual-gpu-device-plugin/master/manifests/device-plugin.yml

# Notes about daemon set:
# - Uses nvml to find which processes use GPU resources
# - Mounts /proc folder to find container information from process id
# - Uses dockershim socket to read detailed container information

# You can set these variables:
# - --vgpu=<number_of_virtual_gpu_one_pyhsical_gpu_can_have> # Default is 10, Max 48
# - --allowmultigpu=<true|false> # Default is false. Prevents vGPU resources that one container can have to fall on different physical gpus.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-device-query
spec:
  hostIPC: true # Required for MPS
  containers:
    - name: nvidia-device-query
      image: ghcr.io/kuartis/nvidia-device-query:1.0.0
      command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      env:
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT # Memory limit for GPU
          value: 0=2G # Read this: https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/
      resources:
        limits:
          # Partition your GPUs inside daemon set with --vgpu=<number> argument
          # Request virtual gpu here
          nvidia.com/gpu: '1'
      volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
  volumes:
    - name: nvidia-mps
      hostPath:
        path: /tmp/nvidia-mps

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
manifests		manifests
pkg/gpu/nvidia		pkg/gpu/nvidia
static/img		static/img
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

virtual-gpu-device-plugin

Install and Test

License

About

Releases

Packages

Languages

License

youscan/virtual-gpu-device-plugin

Folders and files

Latest commit

History

Repository files navigation

virtual-gpu-device-plugin

Install and Test

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages