-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 4770: EndpointSlice Controller Flexibility #4771
base: master
Are you sure you want to change the base?
Conversation
LionelJouin
commented
Jul 19, 2024
- One-line PR description: KEP-4770: Create KEP document (Generic reconciler and disabling controller on particular services)
- Issue link: EndpointSlice controller flexibility #4770
- Other comments:
Welcome @LionelJouin! |
Hi @LionelJouin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Lionel Jouin <[email protected]>
|
||
## Summary | ||
|
||
This proposal adds a new well-known label `service.kubernetes.io/endpoint-controller-name` to Kubernetes Services. This label disables the default Kubernetes EndpointSlice controller for the services where this label is applied and delegates the control of EndpointSlices to a custom EndpointSlice controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the label managed-by
already on the endpointslice controller and the ipaddress, I think we should be consistent and use the same here
rep -r managed-by staging/src/k8s.io/api
staging/src/k8s.io/api/discovery/v1/well_known_labels.go: LabelManagedBy = "endpointslice.kubernetes.io/managed-by"
staging/src/k8s.io/api/discovery/v1beta1/well_known_labels.go: LabelManagedBy = "endpointslice.kubernetes.io/managed-by"
staging/src/k8s.io/api/networking/v1alpha1/well_known_labels.go: LabelManagedBy = "ipaddress.kubernetes.io/managed-by"
staging/src/k8s.io/api/networking/v1beta1/well_known_labels.go: LabelManagedBy = "ipaddress.kubernetes.io/managed-by"
Also, should we add this label to each Service (if it already does not have the label set) to indicate that is managed by the default endpoint slice controller? I kind of think this will be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any name is fine to me. For reference, the service.kubernetes.io/service-proxy-name
is not added to the service. As highlighted in #4771 (comment), a default controller name would have to be handled for the kubernetes controller manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I added this comment just for opening the debate, I'd like to know if there are more thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think endpointslice.kubernetes.io/managed-by
is probably fitting here. I'm not entirely sure whether we want this to apply to Endpoints as well though, if so, that could change what's best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This KEP really does need to explain how it interacts with the existing label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what exactly has to be clarified with existing labels. This KEP proposes a label on the Service object, endpointslice.kubernetes.io/managed-by
is already explained in the Kubernetes documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that someone reading this KEP is clear on why endpointslice.kubernetes.io/managed-by
isn't right and service.kubernetes.io/service-proxy-name
isn't right (bear in mind that at the time of writing the docs say you can set service.kubernetes.io/service-proxy-name
on a Service and get a useful outcome).
If a KEP isn't clear:
- we shouldn't implement it (we should clarify, then implement)
- someone might implement the wrong thing through a misunderstanding
Making the KEP clear helps to manage that risk.
|
||
This proposal adds a new well-known label `service.kubernetes.io/endpoint-controller-name` to Kubernetes Services. This label disables the default Kubernetes EndpointSlice controller for the services where this label is applied and delegates the control of EndpointSlices to a custom EndpointSlice controller. | ||
|
||
Additionally, this KEP aims to give more flexibility for the EndpointSlice Reconciler module to allow users to reconcile EndpointSlices with any type of resources (Currently only Service/Pods is supported). For example, Endpoints could be supported for the Kubernetes EndpointSlice Mirroring Controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I can not really understand it, maybe is explained later ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, "Endpoints could be supported for the Kubernetes EndpointSlice Mirroring Controller" is particularly confusing to me.
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
|
||
### Well-Known Label | ||
|
||
The kube-controller-manager should pass to the Endpoints, EndpointSlice and EndpointSlice Mirroring Controllers an informer selecting services that are not labelled with `service.kubernetes.io/endpoint-controller-name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/kubernetes/enhancements/pull/4771/files#r1685502488 , I think there is 2 things to flesh out here:
- label name
- default behavior, do we add a label with the default controller name or we assume no-label means default controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, kube-proxy has no default controller name, non-labelled services are managed by kube-proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah,, but those are decision from the past, if we have to do it again, will we do it the same?
I don't have much feedback from this feature beside using it to workaround cilium conformance tests :)
I think that least we should evaluate the different options and see pros and cons and decide
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
did a first pass, this KEP is missing some more explanation on the high level details, is too low level, it describe perfectly the changes you want to do in the reconciler logic but is not clear the motivations and how is this planned to be used, some more high level information for the reviewers to understand the changes will help ... an example of how these externals controllers plan to use it will be interesting .... an example of how an external endpoint slice controller consuming this library will be used per example |
Thank you for the review. Should I only add high level overview, or should I remove too low level details too? Do we have to provide a way to use the EndpointSlice Reconciler for external EndpointSlice Controller consumers? The example for the usage in the EndpointSlice and EndpointSlice Mirroring Controllers are there already, I was thinking maybe those are sufficient. Also, the usage might change in the future as written in the EndpointSlice Reconciler readme: |
I think is fine to keep it the way you have it, just expand , I'm trying to highlight that without understanding the motivations and the expectations of what your proposing is hard to fully understand all the details |
keps/sig-network/4770-endpointslice-controller-flexibility/README.md
Outdated
Show resolved
Hide resolved
/ok-to-test |
- Initial unit, integration and e2e tests completed and enabled. | ||
|
||
#### Beta | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has to be some e2e conformance of these out of tree controllers, since is an external controller adding an endpointslice, how do we validate these out of tree controllers are generating the endpointslices accordingly ? what if these controllers does not respect the selectors and generete the endpointslices anyway?
@robscott I really like to hear your thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting thought. I'd assumed that the purpose of this KEP was to build controllers that would intentionally have different behavior than the default in-tree EndpointSlice controller. In some cases that might be to select a different IP for a Pod, but in other cases that might be to select an entirely different type of resource, for example a custom backend type.
This does certainly make it more likely that unexpected variations of EndpointSlices will exist, which is certainly something to watch out for. Since the outset of the EndpointSlice API anyone has been able to create EndpointSlices, and sometimes unexpected variations have led to bugs in controllers consuming those EndpointSlices, but I don't think that's necessarily new with this KEP, just potentially a more common set of edge cases that we'll need to be sure that controller authors are aware of.
For example, nodeName
is optional in EndpointSlices, but some controllers likely assume it exists because the EndpointSlice controllers in k/k generally populate it. The same is largely true for zone and hostname (see podToEndpoint for a reference).
I think this risk is unavoidable with this KEP, so I'd like to see it called out + a corresponding commitment to improved documentation around this as a requirement for graduating past alpha. Unfortunately I can't think of a way to write conformance tests that would help us here though.
overall this LGTM, there may be some edges to flesh out, but this feature looks clear and scoped now |
/cc |
@maiqueb: GitHub didn't allow me to request PR reviews from the following users: maiqueb. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thank you @aojea for the review. Some of your comments might be resolved, but I am not sure, could you mark them as resolved if they are, so it's more clear what I still need to answer and fix:
|
Rename feature-gate from EndpointControllerNameWellKnownLabel to ExternalEndpointController Signed-off-by: Lionel Jouin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on this KEP, because I don't see why it is better than the existing mechanism (empty selector).
spec: {} | ||
``` | ||
|
||
This alternative potentially leads to confusion among users and inconsistency in how services are managed as each implementation is using its own annotation (see the nokia/danm and projectcalico/vpp-dataplane examples), leading to a fragmented approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial reaction to this KEP is that we should not do it. This alternative is better for a bunch of reasons, but mostly because it already exists.
We have a way of expressing "don't generate endpoints for me" - leave the selector blank. Fortunately for you, it means exactly what you want.
The argument of inconsistency across controllers doesn't hold a lot of water for me. If you specify a pod selector, you're going to get pods as endpoints. If you don't specify it, you get something else. IMO, being able to specify that something else in your own schema is an advantage, not "fragmentation". Suppose you need an expression that is more flexible than the legacy selector-map (e.g. modern selectors can use "In" and "NotIn" and other operators).
This is not an accidental side-effect that I am trying to shoe-horn you into, it's literally why this mechanism was designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty selector feature exists to cover use-cases that are not covered by the existing selector, i.e. selecting backends that are not selectable via labels. For example (from the Service documentation), selecting backends in a different namespace/cluster, selecting backends that are running outside of a Kubernetes, or as you mentioned, selecting backends with a more flexible expression covered with modern selectors. In these cases I agree the empty selector must be used and the users must define their own way to select backends, via, for example, annotations.
I realized that the goals and non-goals are not clear enough, so I fixed them in the latest commit. The goal is to disable the Kubernetes EndpointSlice controller and allow the user to re-use the selector field from the Service Spec. Custom EndpointSlice Controller should be able to replicate the usage of the selector, by selecting workloads by labels (labels specified in the selector field) but the difference with the Kubernetes EndpointSlice Controller is the compilation of these EndpointSlices, in this KEP the main use case is to select different IPs than the Pod.Status.PodIPs.
A well-known label would also provide an explicit and standard way to inform which component is managing the EndpointSlices for a particular service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still struggling with this one. I spoke with Rob a little, since he seemed "pro". The argument that carried the most weight for me was that there are (or might be?) cases where the alternative controller is doing exactly what the "real" controller does but extracting some different information (e.g. the multi-network case). In that case, you have to specify the network(s) somewhere right? Another example, albeit hypothetical, was something doing different topology hinting.
I'm not sure how I feel about this, still. It seems "simple" to implement, but I am worried about the tail of side-effects.
As a thought experiment - what if you could specify a function, e.g. CEL, which took a pod as argument and returned a list of IPs to use -- would that eliminate the need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, multi-network is my primary use case, I have personally no plan, at least for now, to do something different such as topology hinting. But, multi-network also means more than just picking the IP for the pods, it also means the network must be attached and up for the selected pods.
Yes, I need to specify which network(s) must be used for the selected pods. Having this label (service.kubernetes.io/endpoint-controller-name
) will give me freedom to choose the way to specify those networks. For my case, I am currently thinking of using Services together with Gateway API (I made a PoC about it: https://github.com/LionelJouin/l-3-4-gateway-api-poc). Gateway in Gateway API allows me to specify new Service-Proxies (Secondary Network Service (load-balancing) acts directly in the Gateway instance) and attach networks via the Gateway.Spec.Infrastructure field, this way I can also specify which networks must be selected. Here is an example on how the Service/Gateway could look like: https://github.com/LionelJouin/l-3-4-gateway-api-poc/blob/main/examples/kpng-gateway-api.yaml#L59-L62.
If I understand correctly, CEL would select a specific field based on what the user would specify via a new field in the Service. I think this would be more complex to maintain, more difficult to use, and less flexible. As of today, the most used solution is Multus, the secondary networks are attached and reported via pod annotation in a json format, I guess it would be quite challenging to match the IPs for a particular network. This could have worked with the Multi-Network KEP changes since the Secondary Network IPs would have been reported in Pod.Status. Now the discussion for multi-network is about DRA and ResourceClaim.Status to report the Secondary Network IPs, I am not sure how CEL could fit here.
For reference, at the beginning, another goal for this KEP was to improve the Kubernetes EndpointSlice Reconciler to make it generic enough so it can be re-used for Secondary Network IPs (No matter if Multus, KEP#3698 or DRA is used). This way, the custom EndpointSlice Controllers for secondary networks would be quite easy to implement. I kept it in the commit history (removed in this commit) and I am still planning to propose again these changes somehow in a better way in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, at the beginning, another goal for this KEP was to improve the Kubernetes EndpointSlice Reconciler to make it generic enough so it can be re-used for Secondary Network IPs (No matter if Multus, KEP#3698 or DRA is used). This way, the custom EndpointSlice Controllers for secondary networks would be quite easy to implement. I kept it in the commit history (removed in this commit) and I am still planning to propose again these changes somehow in a better way in the future.
This is a big no for me, if someone wants to create EndpointSlices controllers out of tree is ok to share some common types and logic but adding more load to the core maintainers just to remove from the custom endpointslice controllers maintainers is simply unfair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big no for me, if someone wants to create EndpointSlices controllers out of tree is ok to share some common types and logic but adding more load to the core maintainers just to remove from the custom endpointslice controllers maintainers is simply unfair.
I agree with this, the idea with this part (generic EndpointSlice Reconciler) of the KEP was to solve what is stated in the readme of EndpointSlice Reconciler: This EndpointSlice reconciler library is not sufficiently generic to be used by the EndpointSlice Mirroring controller...
. But also keeping what is written for compatibility: There are NO compatibility guarantees for this repository, yet. It is in direct support of Kubernetes, so branches will track Kubernetes and be compatible with that repo.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to proceed? Time is running for KEPs.
|
||
The existing behavior will be kept by default, and the Kubernetes EndpointSlice Controller will not manage the Services with the label. This ensures services without the label to continue to be managed as usual. | ||
|
||
This will have no effect on other EndpointSlice controller implementations since they will not be influenced by the presence of this label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This counts as a point in favor of "use an empty selector". But also, that just seems broken. :)
spec: {} | ||
``` | ||
|
||
This alternative potentially leads to confusion among users and inconsistency in how services are managed as each implementation is using its own annotation (see the nokia/danm and projectcalico/vpp-dataplane examples), leading to a fragmented approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still struggling with this one. I spoke with Rob a little, since he seemed "pro". The argument that carried the most weight for me was that there are (or might be?) cases where the alternative controller is doing exactly what the "real" controller does but extracting some different information (e.g. the multi-network case). In that case, you have to specify the network(s) somewhere right? Another example, albeit hypothetical, was something doing different topology hinting.
I'm not sure how I feel about this, still. It seems "simple" to implement, but I am worried about the tail of side-effects.
As a thought experiment - what if you could specify a function, e.g. CEL, which took a pod as argument and returned a list of IPs to use -- would that eliminate the need for this?
I'm generally in favor of this one, since I see it as door opening for a lot of implementations that don't want to be a complete replacement, and I think this optionality is very good for users at a reasonably low maintenance cost. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LionelJouin, shaneutt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Let's hold off on merge until everyone has a minute to think about it :) |
I'm not in favor of merging because we are constantly bitten by edge problems with Endpoints and EndpointSlices, more recently kubernetes/kubernetes#127370 , this just adds another permutation to maintain in an already complicated area and there are alternatives, like completely replacing the endpointslice controller on the kube-controller-manager or empty selector. However, in case we move forward, I expect beta criteria to have at least 2 implementations of these endpointslices controller out of tree in use, adding features is expensive, at least we need to be sure that the cost we are paying is worth it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LionelJouin this KEP needs to explain how it interacts with the similar, existing label endpointslice.kubernetes.io/managed-by
.
See https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#management
We could add an equivalent label for Endpoints and even automatically mirror the label when mirroring an Endpoints to a set of EndpointSlices.
@shaneutt please see #4771 (review) If you agree, please consider cancelling approval. |
|
||
## Motivation | ||
|
||
As of now, a service can be delegated to a custom Service-Proxy/Gateway if the label `service.kubernetes.io/service-proxy-name` is set. Introduced in [KEP-2447](https://github.com/kubernetes/enhancements/issues/2447), this allows custom Service-Proxies/Gateways to implement services in different ways to address different purposes / use-cases. However, the EndpointSlices attached to this service will still be reconciled in the same way as any other service. Addressing more purposes / use-cases, for example, different pod IP addresses, is therefore not natively possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? What if you define a Service without a selector and then do your own reconciliation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment, I clarified that it is not natively possible while using the Service Selector field (The KEP is about this, the fact to use the service as close as possible to how a user would define it for the Kubernetes default network).
About this topic, there are details in this conversation: #4771 (comment)
The list of controllers to enable in the Kube-Controller-Manager can be set using the `--controllers` flag ([kube-controller-manager documentation](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)). The EndpointSlice can then be disabled in the Kube-Controller-Manager and implemented as an external one that will support the label feature described in this KEP. | ||
|
||
This alternative requires significant changes to the cluster management as the cluster level configuration must be modified and a new EndpointSlice controller for the primary network must be developed and deployed to replace the disabled one in the Kube-Controller-Manager. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should - I pretty much think must - document the "don't specify a selector, then do your own thing" option. It is an alternative and we shouldn't pretend otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the smallest number of changes we can make to allow merging this as a provisional KEP?