KEP-5313: Placement Decision API for multicluster scheduling #5314

mikeshng · 2025-05-17T23:48:47Z

One-line PR description: Add a new KEP to introduce the Placement Decision API for multicluster scheduling

Issue link: Placement Decision API for multicluster scheduling #5313

Other comments:

/sig multicluster

k8s-ci-robot · 2025-05-17T23:48:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mikeshng
Once this PR has been reviewed and has the lgtm label, please assign skitt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-multicluster/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-05-17T23:48:57Z

Hi @mikeshng. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mikeshng · 2025-05-17T23:51:49Z

/assign @deads2k @RainbowMango @zhiying-lin

CC @corentone @elgnay @haoqing0110 @jnpacker @qiujian16 @ryanzhang-oss

k8s-ci-robot · 2025-05-17T23:51:52Z

@mikeshng: GitHub didn't allow me to assign the following users: zhiying-lin.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @deads2k @RainbowMango @zhiying-lin

CC @corentone @elgnay @haoqing0110 @jnpacker @qiujian16 @ryanzhang-oss

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

keps/sig-multicluster/5313-placement-decision-api/README.md

iholder101 · 2025-05-19T10:39:01Z

/cc @awels
FYI

k8s-ci-robot · 2025-05-19T10:39:04Z

@iholder101: GitHub didn't allow me to request PR reviews from the following users: awels.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @awels
FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

corentone

Trying to simplify it a bit.

At the same time, will try to suggest sharing our MCO one as the placement.

keps/sig-multicluster/5313-placement-decision-api/README.md

corentone · 2025-05-19T17:18:41Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+* Support continuous rescheduling: decision list may be updated.
+* Guarantee that every `clusterName` entry matches a `ClusterProfile.metadata.name` in the same inventory.
+* Guarantee that every `clusterName` entry is in the same namespace as `PlacementDecision.metadata.namespace`.
+* Provide label conventions so consumers can retrieve all slices of one placement.


I wonder if we actually need this added complexity yet? How many clusters would a placement REALLY need to target? If we get more than a 100, maybe we're not using the right grouping/abstraction?

Yeah, it sounds like a little bit overdesign here.
Mike, can you give an example of what kind of workload needs to be placed on more than 100 clusters?

There are fleets running across thousands of clusters, especially in edge and telco cases. While not everyday case, we have customers do push configs to all of those clusters at once.
The core issue with dumping large lists into a single object is that every change results in a large write. K8s API authors already designed APIs like EndpointSlice to handle watch/write churn and workaround etcd limits. So it makes sense to follow established conventions. Expecting users to manually shard their Placement objects to avoid etcd limits or expensive writes feels like a step backward in API design. CC @deads2k

I think a "daemonset" type of workload could be placed on a large number of clusters. i.e. something like networkPolicy/flowschemas.

keps/sig-multicluster/5313-placement-decision-api/README.md

corentone · 2025-05-19T17:27:01Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+forcing downstream tools such as GitOps engine, workload orchestrator, progressive rollout controller,
+or AI/ML pipeline having to understand a scheduler specific API.
+
+This KEP introduces a vendor neutral `PlacementDecision` API that standardizes


Let's actually call it Placement? That gives us a chance to align the Spec later.

We are using the name PlacementDecision on purpose to show it is only responsible for the scheduler's data only answer to "which clusters should be used?" It's different from any future standard Placement API (or current vendor specific Placement/Scheduling APIs) that defines the request/spec driving that decision.

corentone · 2025-05-19T17:41:04Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+
+### Non-Goals
+
+* Describing how a scheduler made its choice (Placement API spec).


one of our plan for MCO is to publish events on the Why; so while the placement itself shouldn't care, the end user may care (for debugging purposes)

Added the Reason field for end user.

keps/sig-multicluster/5313-placement-decision-api/README.md

corentone · 2025-05-19T17:41:59Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+
+* Describing how a scheduler made its choice (Placement API spec).
+* Describing how consumers access selected clusters.
+* Embedding orchestration logic or consumer feedback in `PlacementDecision`.


For orchestration, we realized in MCO that one state that was really interesting was "drain". E.g. we want to get out of a cluster but slowly.

what did you have in mind for consumer feedback?

I remember Liqian asked the question at the last community meeting, 'How do we know the status, if the decision has been consumed?'.

Even though we haven't seriously talked about the Placement API, the Placement API should reference which workload goes where, and the status. If no status on PlacementDecision, where to get the status?

AFAIK, the overall design has another layer of "syncer" which takes the placement decision combined with the workloads and execute this. I guess the status would be on that syncer.
With that said, while this design is flexible, I feel that the e2e UX may not be ideal since users need to jump from one place to another again and again.

just to elaborate more, the flow seems to be like this

user create a placement API that somehow contains the placement policy that reflects what the workload needs

the placemant controller emits a placementDecision object

user then feed the decision and the workload definition to a syncer API

user then monitor the output the syncer object and adjust the placement policy accordingly

PlacementDecision intentionally omits any scheduling/orchestration spec. Added a "Consumer Feedback" section to clarify that feedback is out of scope for this resource and should be handled by a separate mechanism.

corentone · 2025-05-19T17:43:59Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+  metav1.TypeMeta   `json:",inline"`
+  metav1.ObjectMeta `json:"metadata,omitempty"`
+
+  // Up to 100 ClusterDecisions per object (slice) to stay well below the etcd limit.


Should we remove this limitation of 100 clusters? that way we could avoid the whole idea of having to compose multiple placementdecision CRs together.

I am not sure if we can get out of multiple placementdecision CRs given the ETCD limit.

I agree with Ryan. See #5314 (comment)

keps/sig-multicluster/5313-placement-decision-api/README.md

Signed-off-by: Mike Ng <[email protected]>

k8s-ci-robot added the sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. label May 17, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2025

k8s-ci-robot requested review from JeremyOT and skitt May 17, 2025 23:48

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 17, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 17, 2025

k8s-ci-robot assigned deads2k and RainbowMango May 17, 2025

mikeshng mentioned this pull request May 17, 2025

Placement Decision API for multicluster scheduling #5313

Open

4 tasks

haoqing0110 reviewed May 19, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

zhiying-lin reviewed May 19, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

keps/sig-multicluster/5313-placement-decision-api/README.md Show resolved Hide resolved

mikeshng force-pushed the placement-decision-api branch from 9ca10ab to 3406d3d Compare May 19, 2025 16:02

corentone reviewed May 19, 2025

View reviewed changes

qiujian16 reviewed May 20, 2025

View reviewed changes

RainbowMango reviewed May 20, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

RainbowMango reviewed May 20, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Show resolved Hide resolved

mikeshng force-pushed the placement-decision-api branch from 3406d3d to 881956f Compare May 25, 2025 15:34

zhiying-lin reviewed May 26, 2025

View reviewed changes

mikeshng force-pushed the placement-decision-api branch 3 times, most recently from 11ecf7b to 280c3b3 Compare May 27, 2025 20:49

KEP Placement Decision API

971facb

Signed-off-by: Mike Ng <[email protected]>

mikeshng force-pushed the placement-decision-api branch from 280c3b3 to 971facb Compare May 27, 2025 21:56


		### Non-Goals

		* Describing how a scheduler made its choice (Placement API spec).

KEP-5313: Placement Decision API for multicluster scheduling #5314

Are you sure you want to change the base?

KEP-5313: Placement Decision API for multicluster scheduling #5314

Conversation

mikeshng commented May 17, 2025

Uh oh!

k8s-ci-robot commented May 17, 2025

Uh oh!

k8s-ci-robot commented May 17, 2025

Uh oh!

mikeshng commented May 17, 2025

Uh oh!

k8s-ci-robot commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iholder101 commented May 19, 2025

Uh oh!

k8s-ci-robot commented May 19, 2025

Uh oh!

corentone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!