diff --git a/demo/specs/quickstart/README.md b/demo/specs/quickstart/README.md index b7feeca5..d5690c46 100644 --- a/demo/specs/quickstart/README.md +++ b/demo/specs/quickstart/README.md @@ -1,3 +1,5 @@ +You can run basic examples on a Linux desktop by following the instructions in this [folder](desktop/README.md) as well. + #### Show current state of the cluster ```console kubectl get pod -A diff --git a/demo/specs/quickstart/desktop/README.md b/demo/specs/quickstart/desktop/README.md new file mode 100644 index 00000000..3cf20302 --- /dev/null +++ b/demo/specs/quickstart/desktop/README.md @@ -0,0 +1,336 @@ +# Basic examples for a Linux desktop or workstation +* [Prerequsites](#prerequsites) + * [Examples with different DRA configuration](#examples-with-different-dra-configurations) + * [1. A single pod accesses a GPU via ResourceClaimTemplate](#example-1-spsc-gpu-a-single-pod-accesses-a-gpu-via-resourceclaimtemplate) + * [2. A single pod's multiple containers share a GPU via ResourceClaimTemplate](#example-2-spmc-shared-gpu-a-single-pods-multiple-containers-share-a-gpu-via-resourceclaimtemplate) + * [3. Multiple pods share a GPU via ResourceClaim](#example-3-mpsc-shared-gpu-multiple-pods-share-a-gpu-via-resourceclaim) + * [4. Multiple pods request dedicated GPU access](#example-4-mpsc-unshared-gpu-multiple-pods-request-dedicated-gpu-access) + * [5. A single pod's multiple containers share a GPU via MPS](#example-5-spmc-mps-gpu-a-single-pods-multiple-containers-share-a-gpu-via-mps) + * [6. Multiple pods share a GPU via MPS](#example-6-mpsc-mps-gpu-multiple-pods-share-a-gpu-via-mps) + * [7. A singile pod's multiple containers share a GPU via TimeSlicing](#example-7-spmc-timeslicing-gpu-a-single-pods-multiple-containers-share-a-gpu-via-timeslicing) + * [8. Multiple pods share a GPU via TimeSlicing](#example-8-mpsc-timeslicing-gpu-multiple-pods-share-a-gpu-via-timeslicing) + +## Prerequsites + +You will need a Linux machine with a NVIDIA GPU such as GeForce, install the DRA driver and create a kind cluster by following the instructions in the [DRA driver setup](https://github.com/yuanchen8911/k8s-dra-driver?tab=readme-ov-file#demo). + +#### Show the current GPU configuration of the machine +```console +nvidia-smi -L +``` + +``` +GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-84f293a6-d610-e3dc-c4d8-c5d94409764b) +``` + +#### Show the cluster up +```console +kubectl cluster-info +kubectl get nodes +``` + +``` +Kubernetes control plane is running at https://127.0.0.1:34883 +CoreDNS is running at https://127.0.0.1:34883/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. + +NAME STATUS ROLES AGE VERSION +k8s-dra-driver-cluster-control-plane Ready control-plane 4d1h v1.29.1 +k8s-dra-driver-cluster-worker Ready 4d1h v1.29.1 +``` + +#### Show the DRA-driver running +```console +kubectl get pod -n nvidia-dra-driver +``` + +``` +NAME READY STATUS RESTARTS AGE +nvidia-k8s-dra-driver-controller-6d5869d478-rr488 1/1 Running 0 4d1h +nvidia-k8s-dra-driver-kubelet-plugin-qqq5b 1/1 Running 0 4d1h +``` + + +## Examples with different DRA configurations + +#### Example 1 (SPSC-GPU): a single pod accesses a GPU via ResourceClaimTemplate + +```console +kubectl apply -f single-pod-single-container-gpu.yaml +sleep 2 +kubectl get pods -n spsc-gpu-test +``` + +The pod will be running. +``` +NAME READY STATUS RESTARTS AGE +gpu-pod 1/1 Running 0 6s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` + +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB | ++---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +```console +kubectl delete -f single-pod-single-container-gpu.yaml +``` + +#### Example 2 (SPMC-Shared-GPU): a single pod's multiple containers share a GPU via ResourceClaimTemplate + +```console +kubectl apply -f single-pod-multiple-containers-shared-gpu.yaml +sleep 2 +kubectl get pods -n spmc-shared-gpu-test +``` + +The pod will be running. +``` +NAME READY STATUS RESTARTS AGE +gpu-pod 2/2 Running 2 (55s ago) 2m13s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB | +| 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB | ++---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +```console +kubectl delete -f single-pod-single-container-gpu.yaml +``` + +#### Example 3 (MPSC-Shared-GPU): multiple pods share a GPU via ResourceClaim + +```console +kubectl apply -f multiple-pods-single-container-shared-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-shared-gpu-test +``` + +Two pods will be running. +``` +$ kubectl get pods -n mpsc-shared-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod-1 1/1 Running 0 11s +gpu-pod-2 1/1 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +| 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB | +| 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB | +|=======================================================================================| +``` + +Delete the pods: +```console +kubectl delete -f multiple-pods-single-container-shared-gpu.yaml +``` + +#### Example 4 (MPSC-Unshared-GPU): multiple pods request dedicated GPU access + +```console +kubectl apply -f multiple-pods-single-container-unshared-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-unshared-gpu-test +``` + +One pod will be running and the other one is pending. +``` +$ kubectl get pods -n mpsc-unshared-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod-1 1/1 Running 0 11s +gpu-pod-2 1/1 Pending 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +| 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB | +|=======================================================================================| +``` + +Delete the pods: +``` +kubectl delete -f multiple-pods-single-container-unshared-gpu.yaml +``` + +#### Example 5 (SPMC-MPS-GPU): a single pod's multiple containers share a GPU via MPS + +```console +kubectl apply -f single-pod-multiple-containers-mps-gpu.yaml +sleep 2 +kubectl get pods -n spmc-mps-gpu-test +``` + +The pod will be running. +``` +$ kubectl get pods -n mpsc-mps-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod-1 2/2 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB | +| 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB | +| 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB | ++---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +``` +kubectl delete -f single-pod-multiple-containers-mps-gpu.yaml +``` + +#### Example 6 (MPSC-MPS-GPU): multiple pods share a GPU via MPS + +```console +kubectl apply -f multiple-pods-single-container-mps-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-mps-gpu-test +``` + +Two pods will be running and the other one is pending. +``` +$ kubectl get pods -n mpsc-mps-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod-1 1/1 Running 0 11s +gpu-pod-2 1/1 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB | +| 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB | +| 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB | ++---------------------------------------------------------------------------------------+ +``` + +Delete the pods: +```console +kubectl delete -f multiple-pods-single-container-mps-gpu.yaml +``` + +#### Example 7 (SPMC-TimeSlicing-GPU): a single pod's multiple containers share a GPU via TimeSlicing + +```console +kubectl apply -f single-pod-multiple-containers-timeslicing-gpu.yaml +sleep 2 +kubectl get pods -n spmc-timeslicing-gpu-test +``` + +Two pods will be running and the other one is pending. +``` +$ kubectl get pods -n spmc-timeslicing-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod 1/1 Running 0 11s +``` + +Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB | +| 0 N/A N/A 306442 C ./gpu_burn 21206MiB | ++---------------------------------------------------------------------------------------+``` +``` + +Delete the pods: +```console +kubectl delete -f single-pod-multiple-containers-timeslicing-gpu.yaml +``` + +#### Example 8 (MPSC-TimeSlicing-GPU): multiple pods share a GPU via TimeSlicing + +```console +kubectl apply -f multiple-pods-single-container-timeslicing-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-timeslicing-gpu-test +``` + +Two pods will be running and the other one is pending. +``` +$ kubectl get pods -n mpsc-timeslicing-gpu-test +NAME READY STATUS RESTARTS AGE +gpu-pod-1 1/1 Running 0 11s +gpu-pod-2 1/1 Running 0 11s +``` + +Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): +```console +nvidia-smi +``` +``` ++---------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=======================================================================================| +| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB | +| 0 N/A N/A 306442 C ./gpu_burn 21206MiB | ++---------------------------------------------------------------------------------------+``` +``` + +Delete the pods: +```console +kubectl delete -f multiple-pods-single-containers-timeslicing-gpu.yaml +``` diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml new file mode 100644 index 00000000..cfb1aca8 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml @@ -0,0 +1,96 @@ +# MPSC-MPS-GPU: multiple pods, each with a single container, share a GPU via MPS. + +# Two pods will be running. +# $ kubectl get pods -n spmc-mps-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 14s +# gpu-pod-2 1/1 Running 0 14s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB | +# +---------------------------------------------------------------------------------------+ +# +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-mps-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-mps-sharing +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-mps-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-mps-sharing +spec: + sharing: + strategy: MPS + mpsConfig: + defaultActiveThreadPercentage: 50 + defaultPinnedDeviceMemoryLimit: 10Gi + # defaultPerDevicePinnedMemoryLimit: + # 0: 5Gi + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing + restartPolicy: Never + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml new file mode 100644 index 00000000..1152c287 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml @@ -0,0 +1,75 @@ +# MPSC-Shared-GPU: multiple pods, each with a single container, share a GPU via ResourceClaimTemplate. +# +# Two pods will be running. +# $ kubectl get pods -n mpsc-shared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 11s +# gpu-pod-2 1/1 Running 0 11s + +# Running the command `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB | +# |=======================================================================================| + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-shared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: mpsc-shared-gpu-test + name: shared-gpu +spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-shared-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: shared-gpu + restartPolicy: Never + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-shared-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: shared-gpu + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-timeslicing-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-timeslicing-gpu.yaml new file mode 100644 index 00000000..aaa8c6f6 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-timeslicing-gpu.yaml @@ -0,0 +1,90 @@ +# MPSC-Timeslicing-GPU: multiple pods share a GPU via TimeSlicing. + +# Two pods will be running. +# $ kubectl get pods -n spmc-timeslicing-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 10s +# gpu-pod-2 1/1 Running 0 10s +# +# # Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 322523 C ./gpu_burn 21670MiB | +# | 0 N/A N/A 322585 C ./gpu_burn 2150MiB | +# +---------------------------------------------------------------------------------------+ + +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-timeslicing-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: mpsc-timeslicing-gpu-test + name: gpu-mpsc-timeslicing-gpu-test +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-mpsc-timeslicing-gpu-test +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: mpsc-timeslicing-gpu-test + name: gpu-mpsc-timeslicing-gpu-test +spec: + sharing: + strategy: TimeSlicing + timeSlicingConfig: + timeSlice: Short + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-timeslicing-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: oguzpastirmaci/gpu-burn + args: ["30"] # 30 seconds + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: gpu-mpsc-timeslicing-gpu-test + restartPolicy: Never + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-timeslicing-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: oguzpastirmaci/gpu-burn + args: ["30"] # 30 seconds + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: gpu-mpsc-timeslicing-gpu-test + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml new file mode 100644 index 00000000..51491ac2 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml @@ -0,0 +1,75 @@ +# MPSC-Unshared-GPU: multiple pods, each with a single container, request dedicated access to a GPU. +# +# One pod will be running and the other one will be pending. +# $ kubectl get pods -n mpsc-unshared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 1 (21s ago) 58s +# gpu-pod-2 0/1 Pending 0 25s +# +# Running the command `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB | +# |=======================================================================================| + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-unshared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimTemplateName: gpu.nvidia.com + restartPolicy: Never + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimTemplateName: gpu.nvidia.com + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml new file mode 100644 index 00000000..b3f64548 --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml @@ -0,0 +1,79 @@ +# SPMC-MPS-GPU: a single pod's multiple containers share a GPU via MPS. + +# The pod will be running. +# $ kubectl get pods -n spmc-mps-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 2/2 Running 0 8s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB | +# | 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB | +# | 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-mps-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: spmc-mps-gpu-test + name: gpu-mps-sharing +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-mps-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: spmc-mps-gpu-test + name: gpu-mps-sharing +spec: + sharing: + strategy: MPS + mpsConfig: + defaultActiveThreadPercentage: 50 + defaultPinnedDeviceMemoryLimit: 10Gi + # defaultPerDevicePinnedMemoryLimit: + # 0: 5Gi + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-mps-gpu-test + name: gpu-pod + labels: + app: pod +spec: + containers: + - name: ctr0 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml new file mode 100644 index 00000000..6f0a958a --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml @@ -0,0 +1,58 @@ +# SPMC-Shared-GPU: a single pod's multiple containers share access to a GPU via ResourceClaimTemplate. +# +# The pod will be running. +# $ kubectl get pods -n spmc-shared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 2/2 Running 2 (55s ago) 2m13s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-shared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: spmc-shared-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-shared-gpu-test + name: gpu-pod +spec: + containers: + - name: ctr0 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimTemplateName: gpu.nvidia.com + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml new file mode 100644 index 00000000..bdbca1a9 --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml @@ -0,0 +1,76 @@ +# SPMC-Timeslicing-GPU: a single pod's multiple containers share a GPU via TimeSlicing. + +# The pod will be running. +# $ kubectl get pods -n spmc-timeslicing-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 10s +# +# Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 306436 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 306442 C ./gpu_burn 21206MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-timeslicing-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-ts-sharing +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-ts-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-ts-sharing +spec: + sharing: + strategy: TimeSlicing + timeSlicingConfig: + timeSlice: Short +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-pod + labels: + app: pod +spec: + containers: + - name: ctr0 + #image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + #args: ["--benchmark", "--numbodies=2560000"] + image: oguzpastirmaci/gpu-burn + args: ["60"] # 30 seconds + resources: + claims: + - name: shared-gpu + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: gpu-ts-sharing + restartPolicy: Never diff --git a/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml new file mode 100644 index 00000000..84e854fa --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml @@ -0,0 +1,50 @@ +# SPSC-GPU: a single pod wth a single container accesses a GPU via ResourceClaimTemplate. + +# $ kubectl get pods -n spsc-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 6s + +# Run `nvidia-smi` shoud show something like the following +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spsc-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: spsc-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spsc-gpu-test + name: gpu-pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: single-gpu + resourceClaims: + - name: single-gpu + source: + resourceClaimTemplateName: gpu.nvidia.com + restartPolicy: Never