-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5328: Node Capabilities #5347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome @pravk03! |
Hi @pravk03. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
59e7e54
to
4719180
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dom4ha @sanposhiho @macsko - FYI
4c11e06
to
9254f9b
Compare
/cc @tallclair @yujuhong |
9254f9b
to
f8291a4
Compare
/sig scheduling |
31f7ade
to
b90c0d0
Compare
|
||
* Validate that the kube-scheduler plugin filters nodes based on `node.status.capabilities` when the feature is enabled, and ignores the field when the feature is disabled. | ||
* Validate that `node.status.capabilities` is correctly populated when the feature is enabled, and the field is cleared from the `Node` object when the feature is disabled. | ||
* Validate that the Admission Controller correctly fetches and validates requests against capabilities when the feature is enabled, and does not block requests if the feature is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are all good tests, but these are feature tests - not enablement/disablement.
Enablement/disablement is a test that (as stated in the comment in the template above) that switches the feature gate in the middle of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please take a look.
|
||
Yes. The size of the Node object is expected to increase as more capabilities are introduced. The number of capabilities exported will be limited by strategies such as: | ||
1. Automatically handling feature graduation, which includes ceasing to export a capability once it matures or is no longer needed. | ||
2. Exporting only configurations that are relevant to the control plane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping
b90c0d0
to
4beba06
Compare
We discussed this KEP yesterday on a call. Some notes from that call: Many examples in this KEP of past enhancements that might have used or may use capabilities are not accurate.
Examples where capabilities are useful are:
We discussed that examples above may be often solved for individual vendors (which control the list of enabled FG per node version) by introducing the semver-base node selector. But capabilities for sure provide way better API for this. I would suggest in this KEP:
We also discussed that capabilities must be applied to DaemonSets with no exceptions. I also want to see something explaining how capabilities and Cluster Autoscaler will work together. |
Thanks a lot @SergeyKanzhelev for the discussion and the feedback. I am okay with most of the above suggestions and I will address them in the KEP. I has some thoughts regarding naming.
I am definitely open to naming suggestions, but I believe the name should be broad enough to accommodate future use-cases without requiring a new API field down the road. |
ead37d7
to
419d78a
Compare
Referring my earlier reply on this discussion comment - For swap, node-capability is much needed for 'placement-control' to protect a latency-sensitive pod is never scheduled on a swap-enabled node.
A swap-capability will provide the signal for (2), allowing for simple and clear scheduling rules. Alternatives like 'NFD' exists for detecting swap on a node. But it is out-of-tree and not aware of the Kubelet's specific swap configuration. |
Can we have any examples listed that will justify this. Right now the KEP suggests to use it for FG-related capabilities, while not giving a good examples where it would be non-FG related. |
419d78a
to
5fb093d
Compare
The scheduling part looks good for alpha |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: macsko, pravk03 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The While this is still in early stages, this recent discussion about making the pod requirement for exclusive resources more explicit also indicates a need for non-FG capabilities. The API field itself should be forward-facing enough to support such potential use-cases ?. |
Those are all examples of FG-related capabilities. Not the generic long-term capabilities. |
5fb093d
to
a3e1436
Compare
It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet. |
we kind of need to know what will be expected use cases. Maybe past examples or hypothetical examples thought thru end-to-end. Right now this KEP is limited to just set of name/value pairs and a scenario of FG discoverability. But already we are thinking there MAY be need to support capabilities for node selection, ability to declare tolerations for capabilities, ability to have node-restricted capabilities. Knowing the scope would help to understand if API proposed is needed (among alternatives if the set of use cases is limited) and if needed, what shape should it have. |
a3e1436
to
f069f62
Compare
a3dd053
to
8d6230d
Compare
8d6230d
to
cd6d67e
Compare
I have tried to address these the Case Study section. |
Uh oh!
There was an error while loading. Please reload this page.