-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ WIP: Cluster provider and cluster-aware controllers #3019
base: main
Are you sure you want to change the base?
Conversation
9584ab6
to
9b3f243
Compare
// Name returns the name of the cluster. It identifies the cluster in the | ||
// manager if that is attached to a cluster provider. The value is usually | ||
// empty for the default cluster of a manager. | ||
Name() string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're internally working on a mechanism for dealing with multiple clusters, too. It's been helpful for the cluster identifier to be more flexible than a simple string. We've made it comparable.
Something like this:
type Cluster[ID comparable] interface {
Identifier() ID
...
}
I understand this suggestion would cause a much larger change requiring type parameters in many places. Future work could be done to migrate.
Mostly, I wanted to raise this as something to consider and that calling the method Identifier
would provide room to migrate in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to that, I think renaming it to Identifier()
should be easy enough! Maybe some maintainers can leave a note if they consider this a future endeavour and this would be a worthwhile change to prepare that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were there before, had a logical cluster abstraction, and then went back to string as it just simplifies everything.
@sprsquish Am curious why a string as identifier and some lookup table on the cluster provider side does not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, adding Name() string
to the interface has no impact on the consumers of the Cluster
type. But adding a type parameter forces them to pass e.g. string or whatever everywhere including the manager. I don't think that's a good idea. IMO, we should not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am curious why a string as identifier and some lookup table on the cluster provider side does not work?
Strings work fine.. We like to work with more structure in our identifiers. For instance, working with multiple clusters across cloud providers it's nice to have a structure like this:
type ClusterID struct {
Provider string
Region string
Account string
Name string
}
It can be marshaled into a string, of course, it's just nice not to have to.
I understand the need for a simpler approach and appreciate that this has been discussed already. I think there's a way to migrate over time that follows what's been happening with handlers by introducing TypedCluster[ID comparable]
and an alias for the common use case type Cluster = TypedCluster[string]
(Manager would need the same treatment).
The only change to the current proposal would be to use the more generic Identifier()
rather than Name()
. That leaves the door open for future discussions regarding generic identifiers without having to do all the work up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your use-case correctly, this pattern should help you elegantly: https://goplay.space/#wRWyxv-GUrj
Or in other words: the type only matters in code that is ClusterID
aware. Turning the string back into the rich format is one type conversion away.
9b3f243
to
f87db1f
Compare
f87db1f
to
95a2c06
Compare
95a2c06
to
5c285f6
Compare
type Provider interface { | ||
// Get returns a cluster for the given identifying cluster name. Get | ||
// returns an existing cluster if it has been created before. | ||
Get(ctx context.Context, clusterName string) (Cluster, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the default cluster a thing here already? I.e. can I always pass empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specify what is returned for an unknown cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default cluster is not a thing here, that is in Manager.GetCluster
. The cluster provider doesn't have a concept of a default cluster.
Should we specify what is returned for an unknown cluster?
Probably. I can add a sentence to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on what should happen if the cluster is unknown.
pkg/cluster/multicluster.go
Outdated
// Provider defines methods to retrieve clusters by name. The provider is | ||
// responsible for discovering and managing the lifecycle of each cluster. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"defines methods" is redundant for an interface.
// Provider defines methods to retrieve clusters by name. The provider is | |
// responsible for discovering and managing the lifecycle of each cluster. | |
// | |
// Provider allows to retrieve clusters by name. The provider is | |
// responsible for discovering and managing the lifecycle of each cluster. | |
// |
It's kind of unclear what managing means in a getter interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe be concrete saying:
// Provider defines methods to retrieve clusters by name. The provider is | |
// responsible for discovering and managing the lifecycle of each cluster. | |
// | |
// Provider allows to retrieve clusters by name, e.g. to reconcilers. | |
// The provider is responsible for discovering and managing the lifecycle | |
// of each cluster, calling `Engage` and `Disengage` on the manager | |
// it is run against, whenever a new cluster is discovered or a cluster | |
// is unregistered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I've added that.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: embik The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
83360c4
to
1dcba5e
Compare
b604946
to
a362992
Compare
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Co-authored-by: Iván Álvarez <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected] Co-authored-by: Vince Prignano <[email protected]> Co-authored-by: Dr. Stefan Schimanski <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP <[email protected]> Signed-off-by: Marvin Beckers <[email protected]>
a362992
to
e93979e
Compare
@embik: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the process of reviewing the PR in detail, for now most of my comments are either nits or questions.
This is a really helpful feature in controller-runtime, that can benefit other internal kubernetes-sigs projects.
cc: @sbueringer @joelanford Could we have some eyes on this and the proposal please. Thank you!
return fmt.Errorf("For() can only be used with reconcile.Request, got %T", *new(request)) | ||
var hdler handler.TypedEventHandler[client.Object, request] | ||
switch reflect.TypeFor[request]() { | ||
case reflect.TypeOf(reconcile.Request{}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it matters, I'll leave it up to you - the former is comparing parameter type at runtime and the other is checking for its type at during compile time. Isn't it better to be consistent?
case reflect.TypeOf(reconcile.Request{}): | |
case reflect.TypeFor[reconcile.Request](): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but I tried to just adapt the previous if condition. I'm not opposed to changing it, I just wanted to not fiddle with the condition checked.
case reflect.TypeOf(reconcile.Request{}): | ||
reflect.ValueOf(&hdler).Elem().Set(reflect.ValueOf(handler.WithLowPriorityWhenUnchanged(&handler.EnqueueRequestForObject{}))) | ||
case reflect.TypeOf(reconcile.ClusterAwareRequest{}): | ||
reflect.ValueOf(&hdler).Elem().Set(reflect.ValueOf(handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{ClusterName: cl.Name()}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious on why we need to use reflect and set the value instead of doing a direct assignment? If we are using reflect.TypeOf
previously, aren't we sure of the type at runtime?
Just to make reading of the code easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked myself the same question when I stumbled over this part of the code (see below, it's how the code today sets the handler implementation as well). When I try to change it to something like:
case reflect.TypeOf(reconcile.ClusterAwareRequest{}):
hdler = handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{ClusterName: cl.Name()})
it fails with:
../../pkg/builder/controller.go:335:12: cannot use handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{…}) (value of type handler.TypedEventHandler[client.Object, reconcile.ClusterAwareRequest]) as handler.TypedEventHandler[client.Object, request] value in assignment: handler.TypedEventHandler[client.Object, reconcile.ClusterAwareRequest] does not implement handler.TypedEventHandler[client.Object, request] (wrong type for method Create)
have Create(context.Context, event.TypedCreateEvent[client.Object], workqueue.TypedRateLimitingInterface[reconcile.ClusterAwareRequest])
want Create(context.Context, event.TypedCreateEvent[client.Object], workqueue.TypedRateLimitingInterface[request])
My understanding of generics here is a bit limited, but I think this is because we expect TypedRateLimitingInterface
to take whatever generic request
we give into the function, but we are setting it to a concrete type. I suppose this only works with the reflect magic we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for explaining @embik!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! This is at least how I understand this line. It's a bit unclear to me too.
|
||
// engagedCluster is a map of engaged clusters. The can come and go as the manager is running. | ||
engagedClustersLock sync.RWMutex | ||
engagedClusters map[string]cluster.Cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be super helpful if we can expose the set of engagedClusters from the manager. In Kueue for a multi cluster implementation, we have a list of active and missing clusters based on available quota for admitting workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, you'd be looking for a list of the keys for that map, is that correct (i.e. the cluster identifiers, not the cluster.Cluster
objects directly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a unique identifier for the cluster, so that we can fetch the relevant client and submit workloads in that cluster.
|
||
var errs []error | ||
for _, r := range cm.clusterAwareRunnables { | ||
if err := r.Disengage(ctx, cl); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong - While calling Engage
we are starting a SyncingSource from multicluster cache (
controller-runtime/pkg/builder/controller.go
Line 397 in e93979e
src := &ctxBoundedSyncingSource[request]{ctx: ctx, src: source.TypedKind(cl.GetCache(), projected, handler.WithLowPriorityWhenUnchanged(h), allPredicates...)} |
func (c *typedMultiClusterController[request]) Disengage(ctx context.Context, cl cluster.Cluster) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this works (but I wonder if it's confusing and should be changed to be honest) is that the implementation expects the cluster provider to cancel the clusterCtx
used for Engage
(and subsequently, the ctxBoundedSyncingSource
) when shutting down. Compare the fleet
example here canceling the stored context when figuring out that a cluster has to be removed: https://github.com/embik/controller-runtime/blob/e93979ec5cdd656809f5fbb91b858665979cc040/examples/fleet/main.go#L280
// disengage manager
if mgr != nil {
if err := mgr.Disengage(ctx, k.clusters[name]); err != nil {
k.log.Error(err, "failed to disengage manager")
}
}
// stop and forget
k.lock.Lock()
k.cancelFns[name]() // <- HERE
delete(k.clusters, name)
delete(k.cancelFns, name)
k.lock.Unlock()
The ctx
passed to Disengage
is a different context, one that is currently unused but could be used when disengaging should be cancelable.
I'm open to input here though. I was wondering if the manager should even have a Disengage
(and instead clean up its internal cluster list when the respective context gets canceled), but then the manager itself wouldn't be a cluster.Aware
.
// This is an experimental feature and is subject to change. | ||
EngageWithDefaultCluster *bool | ||
|
||
// EngageWithProvidedClusters indicates whether the controller should engage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// EngageWithProvidedClusters indicates whether the controller should engage | |
// EngageWithProviderClusters indicates whether the controller should engage |
This is a continuation of #2726 and #2207, a prototype for a multi-cluster support in controller-runtime. A design proposal is at #2746 which will need updating depending on the feedback for this PR. I've reorganized code changes into different commits than the previous attempts, but tried to keep authorship of changes faithful to the commits this work is based on.
The main change from the previous PR (#2726) is the update to typed generics support and adjustment to have a BYO request type. The implications of that is a) that you will need to bring your own typed EventHandler and b) that a wrapper function is needed to inject information (mostly the cluster name) when establishing a handler.
Apart from that, the core design of this is the cluster provider that can be plugged into a manager. This unlocks different ways to discover the Kubernetes "fleet" that is being reconciled.