Replies: 1 comment 2 replies
-
Then there's also the issue of co-scheduling resources that depend on each other -- resources don't exist in a vacuum. For example, a Deployment might depend on a ServiceAccount and a Namespace with the "as-needed" strategy, but also a PVC with an "any" strategy, which has already been scheduled to some cluster. This means the Deployment isn't actually "split", it's "split among where my dependencies are available" -- which might end up being just one cluster location. To signal this we need to be able to detect those dependencies, ideally without encoding any knowledge about what a "Deployment" is or depends on, and obviously without modifying the type. We can have some hacky heuristics like searching for a field named When we detect a dependency, we should persist and report it, probably in an annotation: apiVersion: v1
kind: Pod
metadata:
name: my-pod
annotations:
kcp.dev/dependencies: |
[{
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
"name": "my-pvc",
"detected": true,
}] Resource authors should also be able to explicitly describe an object's dependencies: apiVersion: v1
kind: Pod
metadata:
name: my-pod
annotations:
# put this Pod in the same cluster as another one
kcp.dev/dependencies: |
[{
"apiVersion": "v1",
"kind": "Pod",
"name": "pod-friend",
"detected": false,
}] ...which could tie in to non-cluster resource dependencies: apiVersion: v1
kind: Pod
metadata:
name: my-pod
annotations:
kcp.dev/dependencies: |
[{
"apiVersion": "crossplane.io/v1alpha1",
"kind": "Database",
"name": "db-prod",
"detected": false,
}] |
Beta Was this translation helpful? Give feedback.
-
Discussed in #91
A generalized, type-unaware splitter/scheduler will need some general set of strategies it can use to schedule a resource to underlying clusters.
Example strategies
For the Deployment splitter as it is today, that strategy is "split", based on
.spec.replicas
-- i.e., when given a Deployment, create N other deployments (where N is # of clusters), where each gets.spec.replicas / N
. This ignores scheduling constraints for now.For a DaemonSet scheduler, where we want to run one replica on each node of each cluster, the strategy might be "copy" -- when given a DaemonSet, create N copies (N = # of clusters), where each is an exact copy of the original DaemonSet.
For a Pod scheduler, the strategy might be "any" -- when given a Pod, select a cluster (at random, maybe) and label the Pod to be synced to that cluster.
For a Namespace or ServiceAccount, the strategy could be "as-needed" -- when they're created in a kcp, don't do anything, but before syncing something that needs it down to a cluster (e.g., Pod with
namespace: foo
andserviceAccountName: bot
), ensure those resources are also created. This also insinuates cleaning them up when the dependent object is deleted.This is an incomplete list (suggest more!), and CRD authors will inevitably want to define their own, but we can start with some common ones to get to 85% of real use cases.
CRD authors (and K8s built-in types, which are CRDs now) should be able to choose the kcp scheduling strategy for their type, and we should try to find some sane default that won't surprise people too much, if possible.
Syncing / aggregating status
Splitting objects/specs is only half of the story, once the syncer updates the status of the split/copied/whatever resource, the scheduler will also need to know how it should aggregate/summarize that status back to the original object in kcp. For Pods ("any" strategy) and anything without status, that's pretty trivial.
For DaemonSets ("copy" strategy) and Deployments, the status needs to aggregate, for example,
.status.numberReady
,.status.readyReplicas
, by adding each cluster's observed ready replicas. The name of the field(s) that need to be aggregated, and how, needs to be described by the type author.@smarterclayton
Beta Was this translation helpful? Give feedback.
All reactions