-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable NFD rule for GPU resource driver Helm chart #68
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ spec: | |
rules: | ||
- name: "intel.gpu" | ||
labels: | ||
"intel.feature.node.kubernetes.io/gpu": "true" | ||
intel.feature.node.kubernetes.io/gpu: "true" | ||
matchFeatures: | ||
- feature: pci.device | ||
matchExpressions: | ||
|
@@ -21,32 +21,22 @@ spec: | |
- feature: kernel.enabledmodule | ||
matchExpressions: | ||
i915: {op: Exists} | ||
- matchFeatures: | ||
- feature: kernel.loadedmodule | ||
matchExpressions: | ||
xe: {op: Exists} | ||
- matchFeatures: | ||
- feature: kernel.enabledmodule | ||
matchExpressions: | ||
xe: {op: Exists} | ||
--- | ||
apiVersion: nfd.k8s-sigs.io/v1alpha1 | ||
kind: NodeFeatureRule | ||
metadata: | ||
name: intel-gpu-platform-labeling | ||
spec: | ||
rules: | ||
- extendedResources: | ||
gpu.intel.com/millicores: "@local.label.gpu.intel.com/millicores" | ||
gpu.intel.com/memory.max: "@local.label.gpu.intel.com/memory.max" | ||
gpu.intel.com/tiles: "@local.label.gpu.intel.com/tiles" | ||
matchFeatures: | ||
- matchFeatures: | ||
- feature: local.label | ||
matchExpressions: | ||
gpu.intel.com/millicores: {op: Exists} | ||
gpu.intel.com/memory.max: {op: Exists} | ||
gpu.intel.com/tiles: {op: Exists} | ||
name: intel.gpu.fractionalresources | ||
labels: | ||
intel.feature.node.kubernetes.io/gpu: "true" | ||
# generic rule for older and upcoming devices | ||
- labelsTemplate: | | ||
{{"{{"}} range .pci.device {{"}}"}}gpu.intel.com/device-id.{{"{{"}} .class {{"}}"}}-{{"{{"}} .device {{"}}"}}.present=true | ||
|
@@ -90,9 +80,7 @@ spec: | |
value: | ||
- "8086" | ||
name: intel.gpu.generic.count.380 | ||
- labels: | ||
gpu.intel.com/product: "Max_1100" | ||
labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
- labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not notice this earlier, but there are 4 rules for different sets of GPU IDs, each producing the same While this is not a problem for GPU plugin because it does not support nodes with different types of Intel GPUs, DRA GPU driver is supposed to support support such use-case too. I think it's best just to remove these rules completely for now... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks! I removed these rules |
||
matchFeatures: | ||
- feature: pci.device | ||
matchExpressions: | ||
|
@@ -109,9 +97,7 @@ spec: | |
value: | ||
- "0bda" | ||
name: intel.gpu.max.1100 | ||
- labels: | ||
gpu.intel.com/product: "Max_1550" | ||
labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
- labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
matchFeatures: | ||
- feature: pci.device | ||
matchExpressions: | ||
|
@@ -152,10 +138,7 @@ spec: | |
- "0bd6" | ||
- "0bd0" | ||
name: intel.gpu.max.series | ||
- labels: | ||
gpu.intel.com/family: "Flex_Series" | ||
gpu.intel.com/product: "Flex_170" | ||
labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
- labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
matchFeatures: | ||
- feature: pci.device | ||
matchExpressions: | ||
|
@@ -172,10 +155,7 @@ spec: | |
value: | ||
- "56c0" | ||
name: intel.gpu.flex.170 | ||
- labels: | ||
gpu.intel.com/family: "Flex_Series" | ||
gpu.intel.com/product: "Flex_140" | ||
labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
- labelsTemplate: "gpu.intel.com/device.count={{"{{"}} len .pci.device {{"}}"}}" | ||
matchFeatures: | ||
- feature: pci.device | ||
matchExpressions: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are still
gpu.intel.com/family
label rules forA_Series
&Max_Series
, for constency there should be alsoFlex_Series
one, or Flex GPUs should be included toA_Series
, as they're also "Alchemist" variants.Btw. nowadays there would need to be also
B_Series
, but that sounds really bad for the latest & greatest Intel client platform (so such labeling would be very unlikely to be OKed by marketing). I.e. I think the family name of that should rather be e.g.Battlemage
, andA_Series
(which nobody's going to recognize) should rather beAlchemist
...PS. PCI IDs for these are documented here: https://dgpu-docs.intel.com/devices/hardware-table.html#gpus-with-supported-drivers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored nfd rules a bit. See the latest commit