KEP-5328: Node Capabilities #5347

pravk03 · 2025-05-28T00:45:56Z

One-line PR description: Add the initial KEP for KEP 5328: Node Capabilities

Issue link: Node Capabilities #5328

Other comments:

k8s-ci-robot · 2025-05-28T00:46:05Z

Welcome @pravk03!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-05-28T00:46:06Z

Hi @pravk03. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

wojtek-t

@dom4ha @sanposhiho @macsko - FYI

keps/sig-node/5328-node-capabilities/README.md

keps/sig-node/5328-node-capabilities/kep.yaml

keps/sig-node/5328-node-capabilities/README.md

pravk03 · 2025-05-29T00:25:30Z

/cc @tallclair @yujuhong

sanposhiho · 2025-05-29T20:59:40Z

/sig scheduling

keps/sig-node/5328-node-capabilities/README.md

wojtek-t · 2025-06-17T11:40:13Z

keps/sig-node/5328-node-capabilities/README.md

+
+* Validate that the kube-scheduler plugin filters nodes based on `node.status.capabilities` when the feature is enabled, and ignores the field when the feature is disabled.
+* Validate that `node.status.capabilities` is correctly populated when the feature is enabled, and the field is cleared from the `Node` object when the feature is disabled.
+* Validate that the Admission Controller correctly fetches and validates requests against capabilities when the feature is enabled, and does not block requests if the feature is disabled.


Those are all good tests, but these are feature tests - not enablement/disablement.

Enablement/disablement is a test that (as stated in the comment in the template above) that switches the feature gate in the middle of the test.

Updated. Please take a look.

wojtek-t · 2025-06-17T11:40:53Z

keps/sig-node/5328-node-capabilities/README.md

+
+Yes. The size of the Node object is expected to increase as more capabilities are introduced. The number of capabilities exported will be limited by strategies such as:
+1. Automatically handling feature graduation, which includes ceasing to export a capability once it matures or is no longer needed.
+2. Exporting only configurations that are relevant to the control plane.


friendly ping

SergeyKanzhelev · 2025-06-17T20:33:35Z

We discussed this KEP yesterday on a call. Some notes from that call:

Many examples in this KEP of past enhancements that might have used or may use capabilities are not accurate.

For example, LMSConfig will unlikely be using capabilities as a way to understand which LMSConfig is enabled. Reasons being that AppArmor will also want to know that the specific profile is installed. So some sort of LMSConfig object would be a better design. Also many DaemonSets today use both profiles and kubelet just picks whatever is applicable. This will be much harder if LMSConfig will be introduced as capability - each DaemonSet will need to be declared in 3 shapes - one with SeLinux, one with AppArmor, one without either.
Swap is not a good fit for capabilities as a way to discover that the Node has swap configured. Even though it may be an easy way to "avoid API review and introduce a capability", capability is very limiting in it's functionality. For swap discoverability, swap-specific node status (allocatables?) is a better option.
Runtime handlers as a list of handlers is also not a good fit. Default handler runc is not specified in pod spec. So it will not be used by scheduler and by definition must not be added to capabilities. Non-default handlers may need more details on what it is. And names list may not fit into the value length limits. Special object representing the runtime is a better choice here.

Examples where capabilities are useful are:

Feature gates with the specific field in Pod Spec. Like Sidecar, PodLevelResources, etc. Basically, discoverability whether the new filed will work on a given kubelet.
Capability like feature.kubernetes.io/guaranteedQOSPodCPUResize representing the fact that the Feature Gate while in alpha or beta had a certain limitaiton before and now this limitation was lifted. Often it is lifted with the new FG, but not always.
Container runtime missing APIs like for user namespaces support. This also related to UserNamespace capability. And the k8s expectation is that soon ALL nodes will support UserNamespaces. So capabilitty has a lifetime bound to the FG.

We discussed that examples above may be often solved for individual vendors (which control the list of enabled FG per node version) by introducing the semver-base node selector. But capabilities for sure provide way better API for this.

I would suggest in this KEP:

Remove any mention on non-FG related capabilities from the readme, unless there is a good example that can be articulated and explained why capability is a good fit there.
Add a note that the capability is a part of API and requires API review. In k/k codebase we will need to protect the list of capabilities with the api-approvers OWNER file.
Unless we find good examples, let's state that capabilities have a lifetime and we do not expect any long-lived capability.
If we are limiting capabilities to feature gate related features, maybe we should rename capabilities to featureGates to avoid reusing it for long term capabilities long term.

We also discussed that capabilities must be applied to DaemonSets with no exceptions.

I also want to see something explaining how capabilities and Cluster Autoscaler will work together.

pravk03 · 2025-06-17T22:41:45Z

Thanks a lot @SergeyKanzhelev for the discussion and the feedback.

I am okay with most of the above suggestions and I will address them in the KEP. I has some thoughts regarding naming.

If we are limiting capabilities to feature gate related features, maybe we should rename capabilities to featureGates to avoid reusing it for long term capabilities long term.

While our initial examples used to demonstrate the functionality are tied to feature gates, renaming it to featureGates would be too restrictive for future use cases.
A capability should represent a logical use case, which could enabled by a single feature gate, but it could also be a combination of multiple feature gates plus specific configurations. featureGates wouldn't accurately represent such capabilities.

I am definitely open to naming suggestions, but I believe the name should be broad enough to accommodate future use-cases without requiring a new API field down the road.

ajaysundark · 2025-06-18T08:26:58Z

Swap is not a good fit for capabilities as a way to discover that the Node has swap configured. Even though it may be an easy way to "avoid API review and introduce a capability", capability is very limiting in it's functionality. For swap discoverability, swap-specific node status (allocatables?) is a better option.

Referring my earlier reply on this discussion comment -

For swap, node-capability is much needed for 'placement-control' to protect a latency-sensitive pod is never scheduled on a swap-enabled node.
Scheduler control for swap needs two questions:

whether workload needs swap (new api for swap preference from pod-spec)
whether node is swap configured

A swap-capability will provide the signal for (2), allowing for simple and clear scheduling rules.

Alternatives like 'NFD' exists for detecting swap on a node. But it is out-of-tree and not aware of the Kubelet's specific swap configuration.

keps/sig-node/5328-node-capabilities/README.md

SergeyKanzhelev · 2025-06-18T16:40:15Z

I am definitely open to naming suggestions, but I believe the name should be broad enough to accommodate future use-cases without requiring a new API field down the road.

Can we have any examples listed that will justify this. Right now the KEP suggests to use it for FG-related capabilities, while not giving a good examples where it would be non-FG related.

keps/sig-node/5328-node-capabilities/README.md

macsko · 2025-06-18T18:09:30Z

The scheduling part looks good for alpha
/approve as SIG Scheduling

k8s-ci-robot · 2025-06-18T18:09:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: macsko, pravk03
Once this PR has been reviewed and has the lgtm label, please ask for approval from wojtek-t and additionally assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pravk03 · 2025-06-18T18:53:42Z

Can we have any examples listed that will justify this. Right now the KEP suggests to use it for FG-related capabilities, while not giving a good examples where it would be non-FG related.

The guaranteedQOSPodCPUResize example used in the KEP isn't purely a feature gate; it's a logical capability derived from a combination of feature gates and the Kubelet's cpuManagerPolicy configuration.

While this is still in early stages, this recent discussion about making the pod requirement for exclusive resources more explicit also indicates a need for non-FG capabilities. The API field itself should be forward-facing enough to support such potential use-cases ?.

SergeyKanzhelev · 2025-06-18T19:36:18Z

Can we have any examples listed that will justify this. Right now the KEP suggests to use it for FG-related capabilities, while not giving a good examples where it would be non-FG related.

The guaranteedQOSPodCPUResize example used in the KEP isn't purely a feature gate; it's a logical capability derived from a combination of feature gates and the Kubelet's cpuManagerPolicy configuration.

While this is still in early stages, this recent discussion about making the pod requirement for exclusive resources more explicit also indicates a need for non-FG capabilities. The API field itself should be forward-facing enough to support such potential use-cases ?.

Those are all examples of FG-related capabilities. Not the generic long-term capabilities.

tallclair · 2025-06-18T23:35:00Z

It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet.

SergeyKanzhelev · 2025-06-18T23:41:44Z

It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet.

we kind of need to know what will be expected use cases. Maybe past examples or hypothetical examples thought thru end-to-end. Right now this KEP is limited to just set of name/value pairs and a scenario of FG discoverability. But already we are thinking there MAY be need to support capabilities for node selection, ability to declare tolerations for capabilities, ability to have node-restricted capabilities. Knowing the scope would help to understand if API proposed is needed (among alternatives if the set of use cases is limited) and if needed, what shape should it have.

keps/sig-node/5328-node-capabilities/README.md

pravk03 · 2025-06-19T17:25:49Z

Maybe past examples or hypothetical examples thought thru end-to-end

RuntimeClass was intended as a past example used to illustrate non-FG related runtime capabilities in the earlier version of the proposal. I agree that it had some missing details and thanks for highlighted them in your comment.

Runtime handlers as a list of handlers is also not a good fit. Default handler runc is not specified in pod spec. So it will not be used by scheduler and by definition must not be added to capabilities. Non-default handlers may need more details on what it is. And names list may not fit into the value length limits. Special object representing the runtime is a better choice here.

I have tried to address these the Case Study section.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 28, 2025

k8s-ci-robot requested review from dchen1107 and derekwaynecarr May 28, 2025 00:46

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 28, 2025

pravk03 marked this pull request as draft May 28, 2025 00:47

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

pravk03 force-pushed the node-capabilities branch 2 times, most recently from 59e7e54 to 4719180 Compare May 28, 2025 00:59

wojtek-t reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

keps/sig-node/5328-node-capabilities/kep.yaml Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Show resolved Hide resolved

pravk03 force-pushed the node-capabilities branch 3 times, most recently from 4c11e06 to 9254f9b Compare May 28, 2025 23:11

pravk03 changed the title ~~KEP-5328: Node Capability Aware Scheduling~~ KEP-5328: Node Capabilities May 28, 2025

pravk03 marked this pull request as ready for review May 28, 2025 23:14

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

k8s-ci-robot requested a review from mrunalp May 28, 2025 23:14

k8s-ci-robot requested review from tallclair and yujuhong May 29, 2025 00:25

pravk03 force-pushed the node-capabilities branch from 9254f9b to f8291a4 Compare May 29, 2025 01:06

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 29, 2025

pravk03 force-pushed the node-capabilities branch from 31f7ade to b90c0d0 Compare June 17, 2025 06:57

wojtek-t reviewed Jun 17, 2025

View reviewed changes

pravk03 force-pushed the node-capabilities branch from b90c0d0 to 4beba06 Compare June 17, 2025 18:50

pravk03 force-pushed the node-capabilities branch 3 times, most recently from ead37d7 to 419d78a Compare June 18, 2025 02:59

macsko reviewed Jun 18, 2025

View reviewed changes

pravk03 force-pushed the node-capabilities branch from 419d78a to 5fb093d Compare June 18, 2025 17:45

macsko reviewed Jun 18, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

pravk03 force-pushed the node-capabilities branch from 5fb093d to a3e1436 Compare June 18, 2025 20:36

pravk03 force-pushed the node-capabilities branch from a3e1436 to f069f62 Compare June 18, 2025 23:55

SergeyKanzhelev reviewed Jun 19, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Jun 19, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Show resolved Hide resolved

pravk03 force-pushed the node-capabilities branch 2 times, most recently from a3dd053 to 8d6230d Compare June 19, 2025 07:35

KEP-5328: Introduce Node Capabilities KEP

cd6d67e

pravk03 force-pushed the node-capabilities branch from 8d6230d to cd6d67e Compare June 19, 2025 17:18

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 19, 2025

KEP-5328: Node Capabilities #5347

Are you sure you want to change the base?

KEP-5328: Node Capabilities #5347

Conversation

pravk03 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pravk03 commented May 29, 2025

Uh oh!

sanposhiho commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wojtek-t Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

pravk03 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

wojtek-t Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev commented Jun 17, 2025

Uh oh!

pravk03 commented Jun 17, 2025

Uh oh!

ajaysundark commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SergeyKanzhelev commented Jun 18, 2025

Uh oh!

Uh oh!

macsko commented Jun 18, 2025

Uh oh!

k8s-ci-robot commented Jun 18, 2025

Uh oh!

pravk03 commented Jun 18, 2025

Uh oh!

SergeyKanzhelev commented Jun 18, 2025

Uh oh!

tallclair commented Jun 18, 2025

Uh oh!

SergeyKanzhelev commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pravk03 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pravk03 commented May 28, 2025 •

edited

Loading

SergeyKanzhelev commented Jun 18, 2025 •

edited

Loading

pravk03 commented Jun 19, 2025 •

edited

Loading