Skip to content

Commit

Permalink
Enable GPU processing with nvidia-device-plugin
Browse files Browse the repository at this point in the history
  • Loading branch information
mocsharp committed Sep 25, 2023
1 parent b8b0cca commit 7b57142
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 9 deletions.
9 changes: 6 additions & 3 deletions deploy/helm-charts/Chart.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
dependencies:
- name: argo-workflows
repository: https://argoproj.github.io/argo-helm
version: 0.33.1
digest: sha256:bc9fd492011835b2ebb1d418f860eda28691ab8805d3215be2f488fc00cfe236
generated: "2023-09-07T10:32:16.447050486-07:00"
version: 0.33.3
- name: nvidia-device-plugin
repository: https://nvidia.github.io/k8s-device-plugin
version: 0.14.1
digest: sha256:d8e2875bf6b1affdb6bacda1b011a731bb4163165d6fa27b767a76a327597751
generated: "2023-09-22T18:47:17.529533776-07:00"
3 changes: 3 additions & 0 deletions deploy/helm-charts/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ dependencies:
- name: argo-workflows
version: 0.33.3
repository: https://argoproj.github.io/argo-helm
- name: nvidia-device-plugin
version: 0.14.1
repository: https://nvidia.github.io/k8s-device-plugin
22 changes: 18 additions & 4 deletions deploy/helm-charts/docs/01.installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sudo apt-mark hold kubelet kubeadm kubectl
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Install Argo CLI
curl -sLO https://github.com/argoproj/argo-workflows/releases/download/v3.4.10/argo-linux-amd64.gz
curl -sLO https://github.com/argoproj/argo-workflows/releases/download/v3.4.11/argo-linux-amd64.gz
gunzip argo-linux-amd64.gz
chmod +x argo-linux-amd64
sudo mv ./argo-linux-amd64 /usr/local/bin/argo
Expand All @@ -26,9 +26,11 @@ Select one of the following Kubernetes distribution:

### [k3s](https://k3s.io/)

See [Requirements](https://docs.k3s.io/installation/requirements) for K3s for hardware requirements.
See [Requirements](https://docs.k3s.io/installation/requirements) for K3s for hardware requirements and steps to enable [NVIDIA Container Runtime Support](https://docs.k3s.io/advanced#nvidia-container-runtime-support).

```bash
sudo apt install -y nvidia-container-runtime cuda-drivers-fabricmanager-515 nvidia-headless-515-server

curl -sfL https://get.k3s.io | sh -s - --flannel-backend host-gw --service-node-port-range 104-32767 --flannel-external-ip

# Copy default configuration
Expand All @@ -42,6 +44,11 @@ sudo chown $(id -u):$(id -g) $HOME/.kube/config
For detail installation instructions with GPU support, see [cloud-native-stack](https://github.com/NVIDIA/cloud-native-stack/tree/master/install-guides).

```bash
# Disable swap
sudo nano /etc/fstab
# Add a # before all the lines that start with /swap and save the file.


sudo kubeadm init --pod-network-cidr=192.168.0.0/16
# Copy default configuration
mkdir -p $HOME/.kube
Expand Down Expand Up @@ -78,6 +85,11 @@ my-system Ready control-plane 73s v1.28.1

If modifying the port range is not an option, update the port numbers inside `values.yaml` to be in the range.

## Install NVIDIA Container Toolkit

Following the [instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) to install and configure NVIDIA Container Toolkit with your container runtime.


## Build & download MONAI Deploy dependencies

```bash
Expand Down Expand Up @@ -105,8 +117,10 @@ Use the following commands to install MONAI Deploy Helm charts and its dependenc
- Postgres - archives Argo jobs (can be disabled in `values.yaml` > `argo-workflows` > `controller` > `persistence` > `archive=false`)

```bash
helm upgrade -i monai-deploy . # default/current namespace
helm upgrade -i monai-deploy -n my-space . # install in namespace "my-namespace"
# default/current namespace
helm upgrade -i monai-deploy .
# install in namespace "my-namespace"
helm upgrade -i monai-deploy -n my-space .
```

> **Note**
Expand Down
2 changes: 1 addition & 1 deletion deploy/helm-charts/docs/04.Uninstallation.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ sudo rm -rf ~/.kube
## Uninstall Tools

```bash
sudo apt-get purge -y kubeadm kubectl kubelet kubernetes-cni kube* helm
sudo apt-get purge -y kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove -y
```
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ spec:
- /bin/sh
args:
- '-c'
- date -Ins && python3 -u /opt/monai/app/app.py && date -Ins
- date -Ins && time python3 -u /opt/monai/app/app.py && date -Ins
env:
- name: "MONAI_INPUTPATH"
value: "/var/monai/input/"
Expand All @@ -69,3 +69,6 @@ spec:
value: "/opt/monai/models/"
- name: "MONAI_WORKDIR"
value: "/var/monai/"
resources:
limits:
nvidia.com/gpu: 1
2 changes: 2 additions & 0 deletions deploy/helm-charts/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,8 @@ tolerations: []

affinity: {}

nvidia-device-plugin:
allowDefaultNamespace: true

### Argo Workflow ###
argo-workflows:
Expand Down

0 comments on commit 7b57142

Please sign in to comment.