-
Notifications
You must be signed in to change notification settings - Fork 61
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Document the differences on OpenShift * Include useful setup scripts Signed-off-by: Vitaliy Emporopulo <[email protected]>
- Loading branch information
Showing
5 changed files
with
200 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
# Running the NVIDIA DRA Driver on Red Hat OpenShift | ||
|
||
This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. | ||
|
||
## Prerequisites | ||
|
||
Install a recent build of OpenShift 4.16 (e.g. 4.16.0-ec.3). You can obtain an IPI installer binary (`openshift-install`) from the [Release Status](https://amd64.ocp.releases.ci.openshift.org/) page, or use the Assisted Installer to install on bare metal. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/installing/index.html) for different installation methods. | ||
|
||
## Enabling DRA on OpenShift | ||
|
||
Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate. | ||
|
||
Update the cluster scheduler to enable the DRA scheduling plugin: | ||
|
||
```console | ||
$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster | ||
``` | ||
|
||
## NVIDIA GPU Drivers | ||
|
||
The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator. | ||
|
||
**Be careful to disable the device plugin so it does not conflict with the DRA plugin**. It is recommended to leave only the NVIDIA GPU driver and driver toolkit configs, and disable everything else: | ||
|
||
```yaml | ||
<...> | ||
devicePlugin: | ||
enabled: false | ||
<...> | ||
driver: | ||
enabled: true | ||
<...> | ||
toolkit: | ||
enabled: true | ||
<...> | ||
``` | ||
|
||
|
||
The NVIDIA GPU Operator might not be available through the OperatorHub in a pre-production version of OpenShift. In this case, deploy the operator from a bundle or add a certified catalog index from an earlier version of OpenShift, e.g.: | ||
|
||
```yaml | ||
kind: CatalogSource | ||
apiVersion: operators.coreos.com/v1alpha1 | ||
metadata: | ||
name: certified-operators-v415 | ||
namespace: openshift-marketplace | ||
spec: | ||
displayName: Certified Operators v4.15 | ||
image: registry.redhat.io/redhat/certified-operator-index:v4.15 | ||
priority: -100 | ||
publisher: Red Hat | ||
sourceType: grpc | ||
updateStrategy: | ||
registryPoll: | ||
interval: 10m0s | ||
``` | ||
Then follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html). | ||
## NVIDIA Binaries on RHCOS | ||
The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: | ||
```yaml | ||
nvidiaDriverRoot: /run/nvidia/driver | ||
nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk | ||
``` | ||
## OpenShift Security | ||
OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: | ||
```yaml | ||
kubeletPlugin: | ||
containers: | ||
plugin: | ||
securityContext: | ||
privileged: true | ||
seccompProfile: | ||
type: Unconfined | ||
``` | ||
If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. | ||
|
||
```yaml | ||
securityContext: | ||
runAsNonRoot: true | ||
seccompProfile: | ||
type: RuntimeDefault | ||
allowPrivilegeEscalation: false | ||
capabilities: | ||
drop: | ||
- ALL | ||
``` | ||
|
||
If you see the following error when trying to deploy a workload: | ||
|
||
```console | ||
Warning FailedScheduling 21m default-scheduler running Reserve plugin "DynamicResources": podschedulingcontexts.resource.k8s.io "gpu-example" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil> | ||
``` | ||
|
||
apply the following RBAC configuration (this should be fixed in newer OpenShift builds): | ||
|
||
```yaml | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: system:kube-scheduler:podfinalizers | ||
rules: | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- pods/finalizers | ||
verbs: | ||
- update | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
metadata: | ||
name: system:kube-scheduler:podfinalizers:crbinding | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole | ||
name: system:kube-scheduler:podfinalizers | ||
subjects: | ||
- kind: User | ||
name: system:kube-scheduler | ||
``` | ||
|
||
## Using Multi-Instance GPU (MIG) | ||
|
||
Workloads that use the Multi-instance GPU (MIG) feature require MIG to be [enabled](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode) on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. | ||
|
||
You can do it via the driver daemon set pod running on a GPU node as follows (here, the GPU ID is 0, i.e. `-i 0`): | ||
|
||
```console | ||
oc exec -ti nvidia-driver-daemonset-416.94.202402160025-0-g45bd -n nvidia-gpu-operator -- nvidia-smi -i 0 -mig 1 | ||
Enabled MIG Mode for GPU 00000000:0A:00.0 | ||
All done. | ||
``` | ||
|
||
Make sure to stop everything that may hold the GPU before enabling MIG. Otherwise you will see a warning, and the MIG status will have an asterisk (i.e. `Enabled*`), meaning that the setting could not be applied. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -ex | ||
set -o pipefail | ||
|
||
oc create -f - <<EOF | ||
kind: CatalogSource | ||
apiVersion: operators.coreos.com/v1alpha1 | ||
metadata: | ||
name: certified-operators-v415 | ||
namespace: openshift-marketplace | ||
spec: | ||
displayName: Certified Operators v4.15 | ||
image: registry.redhat.io/redhat/certified-operator-index:v4.15 | ||
priority: -100 | ||
publisher: Red Hat | ||
sourceType: grpc | ||
updateStrategy: | ||
registryPoll: | ||
interval: 10m0s | ||
EOF |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -ex | ||
set -o pipefail | ||
|
||
oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -ex | ||
set -o pipefail | ||
|
||
oc apply -f - <<EOF | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: system:kube-scheduler:podfinalizers | ||
rules: | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- pods/finalizers | ||
verbs: | ||
- update | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
metadata: | ||
name: system:kube-scheduler:podfinalizers:crbinding | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole | ||
name: system:kube-scheduler:podfinalizers | ||
subjects: | ||
- kind: User | ||
name: system:kube-scheduler | ||
EOF |