Skip to content

Commit

Permalink
Document deploying DRA to OpenShift
Browse files Browse the repository at this point in the history
* Document the differences on OpenShift
* Include useful setup scripts

Signed-off-by: Vitaliy Emporopulo <[email protected]>
  • Loading branch information
empovit committed Mar 13, 2024
1 parent ac31d61 commit c50be3a
Show file tree
Hide file tree
Showing 5 changed files with 202 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ A document and demo of the DRA support for GPUs provided by this repo can be fou

## Demo

This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver.
This section describes using `kind` to demo the functionality of the NVIDIA GPU DRA Driver. For Red Hat OpenShift, refer to [running the NVIDIA DRA driver on OpenShift](demo/clusters/openshift/README.md).

First since we'll launch kind with GPU support, ensure that the following prerequisites are met:
1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation).
Expand Down
144 changes: 144 additions & 0 deletions demo/clusters/openshift/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Running the NVIDIA DRA Driver on Red Hat OpenShift

This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors.

## Prerequisites

Install a recent build of OpenShift 4.16 (e.g. 4.16.0-ec.4). You can use the Assisted Installer to install on bare metal, or obtain an IPI installer binary (`openshift-install`) from the [Release Status](https://amd64.ocp.releases.ci.openshift.org/) page. Note that a development version of OpenShift requires access to [an internal CI registry](https://docs.ci.openshift.org/docs/how-tos/use-registries-in-build-farm/) in the pull secret. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/installing/index.html) for different installation methods.

## Enabling DRA on OpenShift

Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate.

Update the cluster scheduler to enable the DRA scheduling plugin:

```console
$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster
```

## NVIDIA GPU Drivers

The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator.

**Be careful to disable the device plugin so it does not conflict with the DRA plugin**:

```yaml
devicePlugin:
enabled: false
```
Keep in mind that the NVIDIA GPU operator is needed here only to install NVIDIA binaries on the cluster nodes, and should not be used for other purposes such as configuring GPUs.
The operator might not be available through the OperatorHub in a pre-production version of OpenShift. In this case, deploy the operator from a bundle or add a certified catalog index from an earlier version of OpenShift, e.g.:
```yaml
kind: CatalogSource
apiVersion: operators.coreos.com/v1alpha1
metadata:
name: certified-operators-v415
namespace: openshift-marketplace
spec:
displayName: Certified Operators v4.15
image: registry.redhat.io/redhat/certified-operator-index:v4.15
priority: -100
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m0s
```
Then follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html).
## NVIDIA Binaries on RHCOS
The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart:
```yaml
nvidiaDriverRoot: /run/nvidia/driver
nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk
```
## OpenShift Security
OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file:
```yaml
kubeletPlugin:
containers:
plugin:
securityContext:
privileged: true
seccompProfile:
type: Unconfined
```
If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads.

```yaml
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
```

If you see the following error when trying to deploy a workload:

```console
Warning FailedScheduling 21m default-scheduler running Reserve plugin "DynamicResources": podschedulingcontexts.resource.k8s.io "gpu-example" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>
```

apply the following RBAC configuration (this should be fixed in newer OpenShift builds):

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-scheduler:podfinalizers
rules:
- apiGroups:
- ""
resources:
- pods/finalizers
verbs:
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-scheduler:podfinalizers:crbinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-scheduler:podfinalizers
subjects:
- kind: User
name: system:kube-scheduler
```

## Using Multi-Instance GPU (MIG)

Workloads that use the Multi-instance GPU (MIG) feature require MIG to be [enabled](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode) on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100.

You can do it via the driver daemon set pod running on a GPU node as follows (here, the GPU ID is 0, i.e. `-i 0`):

```console
oc exec -ti nvidia-driver-daemonset-416.94.202402160025-0-g45bd -n nvidia-gpu-operator -- nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:0A:00.0
All done.
```

Make sure to stop everything that may hold the GPU before enabling MIG. For example, the DCGM and DCGM Exporter of the NVIDIA GPU Operator are likely to prevent the MIG setting from being applied. Disable them in the operator's cluster policy if you are planning on using MIG:

```console
Warning: MIG mode is in pending enable state for GPU 00000001:00:00.0:In use by another client
00000001:00:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using the device and retry the command or reboot the system to make MIG mode effective.
```

If the MIG status is marked with an asterisk (i.e. `Enabled*`), it means that the setting could not be fully applied and you may need to reboot the node.

No MIG devices must be pre-configured on the GPU if it is going to be used with DRA &mdash; the DRA driver will configure MIG automatically on the fly.
21 changes: 21 additions & 0 deletions demo/clusters/openshift/add-certified-catalog-source.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc create -f - <<EOF
kind: CatalogSource
apiVersion: operators.coreos.com/v1alpha1
metadata:
name: certified-operators-v415
namespace: openshift-marketplace
spec:
displayName: Certified Operators v4.15
image: registry.redhat.io/redhat/certified-operator-index:v4.15
priority: -100
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m0s
EOF
6 changes: 6 additions & 0 deletions demo/clusters/openshift/enable-dra-profile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster
30 changes: 30 additions & 0 deletions demo/clusters/openshift/extend-kube-scheduler-rbac.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash

set -ex
set -o pipefail

oc apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-scheduler:podfinalizers
rules:
- apiGroups:
- ""
resources:
- pods/finalizers
verbs:
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-scheduler:podfinalizers:crbinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-scheduler:podfinalizers
subjects:
- kind: User
name: system:kube-scheduler
EOF

0 comments on commit c50be3a

Please sign in to comment.