Table of Contents
- Introduction
- Modes and Configuration Options
- Installation
- Verify Sidecar Functionality
- Use HTTPS with XPU Manager
Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by Intel XPU Manager via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under /etc/kubernetes/node-feature-discovery/features.d/
as a feature label file. NFD reads this file and converts it to Kubernetes node labels. These labels are then used by GAS to make scheduling decisions for Pods.
Flag | Argument | Default | Meaning |
---|---|---|---|
-lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted |
-interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) |
-startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
-label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. gpu.intel.com/xe-links |
-allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices |
-use-https | bool | false | Use HTTPS protocol when connecting to XPU Manager |
The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.
The following sections detail how to obtain, deploy and test the XPU Manager XeLink sidecar.
Pre-built images of this component are available on the Docker hub. These images are automatically built and uploaded to the hub from the latest main branch of this repository.
Release tagged images of the components are also available on the Docker hub, tagged with their
release version numbers in the format x.y.z
, corresponding to the branches and releases in this
repository.
Note: Replace <RELEASE_VERSION>
with the desired release tag or main
to get devel
images.
See the development guide for details if you want to deploy a customized version of the plugin.
Install XPU Manager daemonset with the XeLink sidecar
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
Please see XPU Manager Kubernetes files for additional info on installation.
Use patch to add sidecar into the XPU Manager daemonset.
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
You can verify the sidecar's functionality by checking node's xe-links labels:
$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}"
master,0.0-1.0_0.1-1.1
XPU Manager can be configured to use HTTPS on the metrics interface. For the gunicorn sidecar, cert and key files have to be added to the command:
- command:
- gunicorn
...
- --certfile=/certs/tls.crt
- --keyfile=/certs/tls.key
...
- xpum_rest_main:main()
The gunicorn container will also need the tls.crt and tls.key files within the container. For example:
containers:
- name: python-exporter
volumeMounts:
- mountPath: /certs
name: certs
readOnly: true
volumes:
- name: certs
secret:
defaultMode: 420
secretName: xpum-server-cert
In this case, the secret providing the certificate and key is called xpum-server-cert
.
The certificate and key can be added manually to a secret. Another way to achieve a secret is to leverage cert-manager.
Example for the Cert-manager objects
Cert-manager will create a self-signed certificate and the private key, and store them into a secret called xpum-server-cert
.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serving-cert
spec:
dnsNames:
- xpum.svc
- xpum.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: xpum-server-cert
For the XPU Manager sidecar, use-https
has to be added to the arguments. Then the sidecar will leverage HTTPS with the connection to the metrics interface.
args:
- -v=2
- -use-https