From 2293f960c0e70c59c74248ecb9141d76f8b9f5a3 Mon Sep 17 00:00:00 2001 From: Vitaliy Emporopulo Date: Thu, 7 Mar 2024 15:05:17 +0200 Subject: [PATCH] Document deploying DRA to OpenShift * Document the differences on OpenShift * Include useful setup scripts Signed-off-by: Vitaliy Emporopulo --- README.md | 2 +- demo/clusters/openshift/README.md | 112 ++++++++++++++++++ demo/clusters/openshift/dra-feature-gate.yml | 6 + demo/clusters/openshift/enable-dra-profile.sh | 6 + 4 files changed, 125 insertions(+), 1 deletion(-) create mode 100644 demo/clusters/openshift/README.md create mode 100644 demo/clusters/openshift/dra-feature-gate.yml create mode 100755 demo/clusters/openshift/enable-dra-profile.sh diff --git a/README.md b/README.md index 70825e8c..aeca284e 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ A document and demo of the DRA support for GPUs provided by this repo can be fou ## Demo -This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. +This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. For Red Hat OpenShift, refer to [running the NVIDIA DRA driver on OpenShift](demo/clusters/openshift/README.md). First since we'll launch kind with GPU support, ensure that the following prerequisites are met: 1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation). diff --git a/demo/clusters/openshift/README.md b/demo/clusters/openshift/README.md new file mode 100644 index 00000000..a82573c5 --- /dev/null +++ b/demo/clusters/openshift/README.md @@ -0,0 +1,112 @@ +# Running the NVIDIA DRA Driver on Red Hat OpenShift + +This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. + +## Prerequisites + +Install OpenShift 4.16 or later. You can use the Assisted Installer to install on bare metal, or obtain an IPI installer binary (`openshift-install`) from the [OpenShift clients page](https://mirror.openshift.com/pub/openshift-v4/clients/ocp/) page. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/installing/index.html) for different installation methods. + +## Enabling DRA on OpenShift + +Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.16/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate. + +Update the cluster scheduler to enable the DRA scheduling plugin: + +```console +$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster +``` + +## NVIDIA GPU Drivers + +The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator with the device plugin disabled. Follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html), and **_be careful to disable the device plugin so it does not conflict with the DRA plugin_**: + +```yaml + devicePlugin: + enabled: false +``` + +## NVIDIA Binaries on RHCOS + +The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: + +```yaml +nvidiaDriverRoot: /run/nvidia/driver +nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk +``` + +## OpenShift Security + +OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: + +```yaml +kubeletPlugin: + containers: + plugin: + securityContext: + privileged: true + seccompProfile: + type: Unconfined +``` + +If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. + +```yaml + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL +``` + +## Using Multi-Instance GPU (MIG) + +Workloads that use the Multi-instance GPU (MIG) feature require MIG to be enabled on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. + +First, make sure to stop any custom pods that might be using the GPU to avoid disruption when the new MIG configuration is applied. + +Enable MIG via the MIG manager of the NVIDIA GPU Operator. **Do not configure MIG devices as the DRA driver will do it automatically on the fly**: + +```console +$ oc label node nvidia.com/mig.config=all-enabled --overwrite +``` + +MIG will be automatically enabled on the labeled nodes. For additional information, see [MIG Support in OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html). + +**Note:** +The `all-enabled` MIG configuration profile is available out of the box in the NVIDIA GPU Operator starting v24.3. With an earlier version, you may need to [create a custom profile](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html#creating-and-applying-a-custom-mig-configuration). + +You can verify the MIG status using the `nvidia-smi` command from a GPU driver pod: + +```console +$ oc exec -ti nvidia-driver-daemonset- -n nvidia-gpu-operator -- nvidia-smi ++-----------------------------------------------------------------------------------------+ +| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: N/A | +|-----------------------------------------+------------------------+----------------------+ +| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|=========================================+========================+======================| +| 0 NVIDIA A100 80GB PCIe On | 00000000:17:00.0 Off | On | +| N/A 35C P0 45W / 300W | 0MiB / 81920MiB | N/A Default | +| | | Enabled | ++-----------------------------------------+------------------------+----------------------+ +``` + +**Note:** +On some cloud service providers (CSP), the CSP blocks GPU reset for GPUs passed into a VM. In this case [ensure](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html#enabling-mig-during-installation) that the `WITH_REBOOT` environment variable is set to `true`: + +```yaml + migManager: + ... + env: + - name: WITH_REBOOT + value: 'true' + ... +``` + +When MIG settings could not be fully applied, the MIG status will be marked with an asterisk (i.e. `Enabled*`) and you will need to reboot the nodes manually. + +See the [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) for more information about MIG. diff --git a/demo/clusters/openshift/dra-feature-gate.yml b/demo/clusters/openshift/dra-feature-gate.yml new file mode 100644 index 00000000..5b434565 --- /dev/null +++ b/demo/clusters/openshift/dra-feature-gate.yml @@ -0,0 +1,6 @@ +apiVersion: config.openshift.io/v1 +kind: FeatureGate +metadata: + name: cluster +spec: + featureSet: TechPreviewNoUpgrade diff --git a/demo/clusters/openshift/enable-dra-profile.sh b/demo/clusters/openshift/enable-dra-profile.sh new file mode 100755 index 00000000..222b74fe --- /dev/null +++ b/demo/clusters/openshift/enable-dra-profile.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +set -ex +set -o pipefail + +oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster