Skip to content

KEP 5339 - ClusterProfile Credentials external providers #5338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

corentone
Copy link

  • Introduce a way to out-of-controller-tree load and use credentials when using ClusterProfile
  • Issue link: N/A

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2025
@k8s-ci-robot k8s-ci-robot requested review from JeremyOT and skitt May 23, 2025 20:41
@corentone corentone changed the title KEP XXX - ClusterProfile Credentials external providers KEP 5338 - ClusterProfile Credentials external providers May 23, 2025
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: corentone
Once this PR has been reviewed and has the lgtm label, please assign jeremyot for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 23, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @corentone. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 23, 2025
@corentone corentone changed the title KEP 5338 - ClusterProfile Credentials external providers KEP 5339 - ClusterProfile Credentials external providers May 23, 2025

In order to populate the Cluster object that the exec provider requires, we standardize the following properties.

Property: `multicluster.sigs.io/execprovider-cluster`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi corentone! This information would be exposed to all readers of the Cluster Profile object, which I fear might incur some security concerns?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another part is that different plugins might require different cluster-specific information -> would this be a free-form property?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each plugin would also need some application specific information to make the authentication work. Those obviously are not ideal to be exposed directly on Cluster Profile; I am wondering if we could/should formulate a way where application specific information can be discovered as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding privacy of the endpoint, I'm wondering if it is really something we should hide. I don't think we should tbh. ClusterProfiles would still be protected by RBAC so I think the endpoint is not a sensitive thing to store.

Regarding the cluster-specific info, could you share more details? I do suggest the ClusterManager provides the full data struct that would contain all the fields that are in the Cluster object defined above. Do you think we should allow modifications to it?

My assumptions is that the plugin would be installed or setup in such a way that it would be able to use the pod's volumes, service accounts etc. So they would have access to the KSA especially and other envvars that may help identify the controller.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is mostly that, Cluster Profile objects are something that many different agents would access; and a lot of such agents do not use the API for authentication (they could be doing scheduling work, observability work, etc.). RBAC can help, but the thing is, with the current RBAC rules we only have object-level granularity so if the user grants an agent access to Cluster Profiles, it will have access to the auth-related data, even if the agent does not need it. Using object references, on the other hand, would allow users to set RBAC rules for the auth data specifically. Though I do agree too that data such as API server address/CA bundle are not as sensitive as tokens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the plugin provides is not auth credentials its information on HOW to obtain auth credentials. So I don't think it has to be hidden. It can simply be ignored by readers that dont need (or can't get) credentials to the remote cluster.

Object references would drastically increase the complexity of ClusterProfile and the permission setup so I wouldn't recommend it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the cluster-specific and application-specific data, we might have gone over this brief on the Slack channel; basically, depending on how the authn workflow is implemented (there are obviously a variety of options), each plugin would need its own set of inputs.

For example:

  • for a plugin that, let's just assume can magically (and securely) acquires a service account token out of thin air, it would need the following items to complete the auth flow:

    • (cluster-specific info.): API server address and CA bundle
    • (application-specific info.): the service account name that represents the application (that invokes the plugin)
  • for a plugin that uses federated identities, it would need:

    • (cluster-specific info.): API server address, CA bundle, expected audience for token exchange, and some platform specific information (on AKS -> tenant ID, on GKE -> workload identity provider full ID)
    • (application-specific info.): (on AKS -> client ID)
  • for a plugin that uses SPIRE, it would need:

    • (cluster-specific info.): API server address, CA bundle, expected audience
    • (application-specific info.): spiffe ID

Considering flexibility reasons, we might need to use something free-form I assume? Of course we could just assume that everything is read from local setup (mounted volumes, configMaps, etc.). but it kinds of defeats the benefits of having cluster profiles (esp. for the cluster-specific information part). If we could have application-specific information linked with Cluster Profiles objects (this one must be done via object references, for obvious reasons), it could further simplify the flow I guess.

Copy link

@michaelawyu michaelawyu May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more relevant to another comment I posted, but I am wondering if we could have something like (modifying the Cluster Profile API):

...
spec:
    authProviders:
    - type: aksoidc
       clusterInfo:
       # we could do either inline (arbitrary map)...
           caData: ...
           server: ...
           audience: ...
           tenantID: ...
       # or use object refs
           objectRef: ....
       targetApplications:
       - name: argo
          applicationInfo:
              objectRef: ...
       - name: multi-kueue
          applicationInfo:
              objectRef: ...

This way each type of plugins could work (exec typed or libraries, not suggesting that this KEP should touch the latter, though), and it saves the trouble of users having to mount a bunch of information manually just for auth purposes, esp. considering that the info would be cluster-specific. RBAC could be applied to the ref'd objects for better security (and less cluttering?).

Of course the example is just an initial thought for demonstration purposes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

application specific info are provided by the controller when setting up the plugin, never by cluster profile. ClusterProfile is not aware of the applications using it. Cluster profile says what "plugin types" work with it.
Now yes the application will have to provide that data to the plugin. That will likely be done via volume, configmaps, envvars etc. Given the plugin is exec'ed, it has a access to the same data as the controller itself (volumes/FS, environment, KSA...).
IMO having application-specific information is part of what doomed the previous KEP on credentials and we can't make that mistake again. It would require ClusterProfile to know about all the applications that can use it and how they are setup. I don't see that approach working.

The clusterprofile needs to provide all the cluster-specific information (endpoints and connectivity as well as plugin types and pointers to what authN mechanism is recommended).

Let me do another pass at the KEP to provide a new ClusterProfile example as well as a Controller example.
Fwiw, for GKE, it would leverage the federated identity, which can be obtained just by calling the metadata server, so clusterprofile doesn't need to provide us with anything.

Copy link

@michaelawyu michaelawyu May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the application specific info via object refs could help simplify the process, but I would understand if it's something that we want to leave for the application itself to set up, i.e., taking an opinionated approach on the API part and making sure that any sensitive info cannot be added). To me personally, the reason why the last push wasn't successful is mostly that we were trying to encourage a practice (passing tokens across cluster boundaries) that has security concerns (even though it is commonly used) 😞.

Looking forward to seeing the new pass. It would be really, really nice if we could:

  • on the API level having neutral names so that no limits are implicitly imposed on how the flow is implemented (of course for the KEP, and the current efforts in the WG, I totally understand that we are taking a exec plugin focused approach, but on the API side we probably shouldn't enforce this?)
  • we would have a field that allows placement of arbitrary cluster-specific authentication info as different authn workflows might have different requirements (though some fields might be very commonly used). I fear that not all platforms would handle the flow as GKE does (metadata server), especially for those who have a hybrid environment.

The plugin is designated by the clusterprofile, which designates the type of provider that is expected to be called.
The clusterprofile has a property to designate the plugin.

Property: `multicluster.sigs.io/execprovider-name`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi corentone! I assume that here we implicitly require that one cluster could have only one provider?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but the controller could decide what plugin to use. Which also means they could use a different binary for a given provider name.

For ex, the cluster could be GKE, but as a controller I could use the official-gke-fleet plugin or official-gke-as-user plugin or i could write my own: gke-custom-plugin mechanism.
Any cluster seen as GKE would use the chosen plugin.

@michaelawyu
Copy link

I am wondering if we should, since the provider might not necessarily be exec plugin based, use more neutral names for the flow. It also feels a bit that the feature might warrant its own field in the API rather than living as an extensible properties...

@corentone
Copy link
Author

I am wondering if we should, since the provider might not necessarily be exec plugin based, use more neutral names for the flow. It also feels a bit that the feature might warrant its own field in the API rather than living as an extensible properties...

I do think it should be only exec-based, that way we have a single path for now and directly start out of tree.
I do agree on the dedicated field, thats a good point. Let's see what others lean towards and Ill make the changes


In order to populate the Cluster object that the exec provider requires, we standardize the following properties.

Property: `multicluster.sigs.io/execprovider-cluster`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why execprovider (which feels implementation-derived) rather than something aligned with the feature provided, such as credprovider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is the intent to match the exec section in kubeconfig?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning on reusing the full exec section of the kubeconfig but instead only leveraging the protocol for now. But maybe we could? kubeconfig's exec is a bit more specific on how to call the plugin that I was thinking of being.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is clear or not, but probably worth clarifying - you're not currently using any part of the exec config from the kubeconfig format.
As far as I can tell, you're only providing a piece of the request that is sent between kubectl and the exec plugin in some scenarios. That might be good enough to specify the endpoint though.

The other part of the ExecCredentialSpec that you're using specifies whether the request is interactive or not. There is probably no reason to include that if this API is meant for controllers.


In order to call the plugin, the library execs the plugin defined in the configuration. It passes the Cluster information that
was obtained from the ClusterProfile.
The library then calls the plugin following the protocol defined in [KEP 541](https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/541-external-credential-providers/README.md). The library provided in https://github.com/kubernetes-sigs/cluster-inventory-api can leverage the original code that is kept in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clientcmd?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think thats the one. I may have left this sentence hanging looking for the code. oops.
will add it!


### Converting endpoint properties to cluster object

The Cluster structure for the exec defined in KEP 541 assumes the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have an additional free form "data" in the clusterProfile API so that each "plugin" can utilize during the authentication process. i.e. Azure plugin would need the cluster's tenant and subscription ID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Config field of the Cluster struct could be used for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that information already be a part of the controllers exec plugins config? See examples at https://github.com/int128/kubelogin

### External credentials Provider plugin mechanism

In order to call the plugin, the library execs the plugin defined in the configuration. It passes the Cluster information that
was obtained from the ClusterProfile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where we need an extra field/status in the ClusterProfile. We can also make this as a standard property if a field feels a bit too rigid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field sounds fine to me. I think there was another comment about it. I'll move all my things to fields.


- Gather feedback from developers and surveys
- Complete features A, B, C
- Additional tests are in Testgrid and linked in KEP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a Testgrid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch the template. but at google testgrid is a grid with all test results across date/testruns.

* Provide a library for controllers to obtain credentials for a cluster represented by a ClusterProfile
* Allow cluster managers to provide a method to obtain credentials that doesn't require to be embedded into the controller code
and recompiling.
* Be a secure mechanism for credential obtention and storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would not hurt to get a buy-in from the sig Auth

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, could you invite a couple people to this KEP? I don't remember POCs.
Given how similar it is to the kubeconfig exec, I do expect good inputs from them.

Copy link
Member

@stlaz stlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how the proposed library retrieves configuration for the exec credential plugins it is supposed to use. Is the expectation for the plugins to be configured in the controller's default kubeconfig?

There are likely to be subtle differences for each controller (e.g. client ID/secret in OIDC) but the base would be the same.


## Proposal

The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not going to be issuing the credentials yourself, correct?

Suggested change
The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
The proposed approach to consuming credentials is to leverage plugins, where a controller could leverage a library running local code


## Proposal

The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the current phrasing, is it this?

Suggested change
The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could use a library to run a local executable

## Proposal

The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
which would retrieve the credentials for the current controller and for the clusterprofile of their choice. It is expected that plugins would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
which would retrieve the credentials for the current controller and for the clusterprofile of their choice. It is expected that plugins would
which would retrieve the credentials for a given clusterprofile. It is expected that plugins would


The proposed approach to issuing credentials is to leverage plugins for credentials, where a controller could leverage a library running local code
which would retrieve the credentials for the current controller and for the clusterprofile of their choice. It is expected that plugins would
leverage the identity of the controller (for example, the Kubernetes Service Account when running in Kubernetes) to retrieve credentials that are valid on other clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, the Kubernetes Service Account when running in Kubernetes

What other modes are you planning to support? Which other identities would be available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP does not seem to directly mention using the controller's identity anywhere else but in this paragraph. I am not sure workload identity is currently typical for use with exec plugins but I may be wrong.

Comment on lines +159 to +160
Plugins would be exec'ed by the controller so that they don't need be built-in the binary, allowing cluster managers to
write their own credentials and still leveraging multicluster controllers written by the community.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a word missing

Suggested change
Plugins would be exec'ed by the controller so that they don't need be built-in the binary, allowing cluster managers to
write their own credentials and still leveraging multicluster controllers written by the community.
Plugins would be exec'ed by the controller so that they don't need be built-in the binary, allowing cluster managers to
write their own credential plugins and still leveraging multicluster controllers written by the community.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a plugin is meant to be executed by the controller, does that mean that the plugin has to be present in the container? What would be the delivery mechanism?


The Cluster structure for the exec defined in KEP 541 assumes the following:
```
type Cluster struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


In order to populate the Cluster object that the exec provider requires, we standardize the following properties.

Property: `multicluster.sigs.io/execprovider-cluster`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is clear or not, but probably worth clarifying - you're not currently using any part of the exec config from the kubeconfig format.
As far as I can tell, you're only providing a piece of the request that is sent between kubectl and the exec plugin in some scenarios. That might be good enough to specify the endpoint though.

The other part of the ExecCredentialSpec that you're using specifies whether the request is interactive or not. There is probably no reason to include that if this API is meant for controllers.


### Configuring plugins in the controller

Plugins are designated by a hardcoded string, for example, "gke" for GKE Clusters, which allows the controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, maybe?

Suggested change
Plugins are designated by a hardcoded string, for example, "gke" for GKE Clusters, which allows the controller
Plugins are selected by a hardcoded string, for example, "gke" for GKE Clusters, which allows the controller

Comment on lines +259 to +260
to attach a different binary name or path for the the binary. It is expected that the library will have a mapping from
its supported plugin names to the expected binary to call. It is expected that the mapping is provided via a flag on the controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't mapping binaries make this vendor/implementation specific? It's probably not useful to just hardcode the binary if you don't know how it's configured, right? I.e. hardcoding a binary only specifies the Command part of the ExecConfig.


### Selecting the right plugin

The plugin is designated by the clusterprofile, which designates the type of provider that is expected to be called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The plugin is designated by the clusterprofile, which designates the type of provider that is expected to be called.
The plugin is specified in the clusterprofile, which designates the type of provider that is expected to be called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants