Custom Scheduler Replica Propagation #6045

LavredisG · 2025-01-14T05:59:04Z

Since ReplicaScheduling step is not customisable on the Scheduling Process, what is the way to go if I were to do that? If for example I score some clusters based on custom metrics and I would like to assign replicas to them based on these scores, is it possible (if it's not, where would cluster scores be used, because I am struggling to come up with a use case other than replica propagation)? Or would it require something like custom controllers to do that?

The text was updated successfully, but these errors were encountered:

chaosi-zju · 2025-01-14T07:28:26Z

There are two scenarios:

The first scheduling is based on custom metrics, and thereafter, changes in the metrics do not affect the replica allocation results.
The first scheduling is based on custom metrics, if there are significant changes in the metrics, the expected replica allocation results should also be adjusted accordingly.

Which scenario do you belong to?

LavredisG · 2025-01-14T18:45:59Z

I would expect to have a Deployment as Input {CPU, RAM, Replicas, DelayThreshold}, that would have to be scheduled on a multi-cluster environment based on dynamic metrics such as expected incoming traffic, network delays, resource consumption etc. So, at its simplest form I would be interested in the 1st case where only replica propagation matters, but if you can dive a bit deeper and also explain how this would expand to adjust replica allocation dynamically and not only statically when first scheduling, I'd be grateful, since I am not 100% sure which use case we will end up following.

chaosi-zju · 2025-01-15T03:04:06Z

For 1st case, you can try implement a custom scheduler-estimator.

You known as for dynamic weight scheduling strategy, scheduler-estimator will calculate MaxAvailableReplicas of each member cluster, and then scheduler will divide the replicas by the weight of MaxAvailableReplicas.

Currently, the scheduler-estimator determines the MaxAvailableReplicas based solely on the available CPU and memory of the cluster and pods requirements. You might consider customizing a scheduler-estimator to use your own metrics for evaluating the MaxAvailableReplicas for each cluster, which would allow for a more accurate estimation of the reasonable allocation ratios across clusters.

LavredisG · 2025-01-20T15:11:53Z

According to this, the current implementation uses scheduler-estimator only when the Type is Divided and Preference is Aggregated as per my understanding, is that correct? I am thinking that I would probably need Divided/Weighted/DynamicWeight for my case, so should I go for a custom scheduler-estimator or a custom factor for the DynamicWeight?

Regarding Cluster Resource Modeling, since Customised Cluster Resource Modeling is used since Karmada 1.4 as default, is the General Cluster Modeling useless if you use any version after v1.4?

chaosi-zju · 2025-01-21T03:10:05Z

According to this, the current implementation uses scheduler-estimator only when the Type is Divided and Preference is Aggregated as per my understanding, is that correct?

no, scheduler-estimator serve for Divided/Aggregated and Divided/DynamicWeight, the document is a bit ambiguous.

is the General Cluster Modeling useless if you use any version after v1.4?

I didn't get your doubts, in fact, there can be multiple estimators working in the scheduler (General Cluster Modeling is a general estimator and scheduler-estimator is a accurate estimator).

The relationship between different estimators is:

karmada/pkg/scheduler/core/util.go

Lines 72 to 92 in 820fd06

    
           // Get the minimum value of MaxAvailableReplicas in terms of all estimators. 
        
           estimators := estimatorclient.GetReplicaEstimators() 
        
           ctx := context.WithValue(context.TODO(), util.ContextKeyObject, 
        
           	fmt.Sprintf("kind=%s, name=%s/%s", spec.Resource.Kind, spec.Resource.Namespace, spec.Resource.Name)) 
        
           for name, estimator := range estimators { 
        
           	res, err := estimator.MaxAvailableReplicas(ctx, clusters, spec.ReplicaRequirements) 
        
           	if err != nil { 
        
           		klog.Errorf("Max cluster available replicas error: %v", err) 
        
           		continue 
        
           	} 
        
           	klog.V(4).Infof("Invoked MaxAvailableReplicas of estimator %s for workload(%s, kind=%s, %s): %v", name, 
        
           		spec.Resource.APIVersion, spec.Resource.Kind, namespacedKey, res) 
        
           	for i := range res { 
        
           		if res[i].Replicas == estimatorclient.UnauthenticReplica { 
        
           			continue 
        
           		} 
        
           		if availableTargetClusters[i].Name == res[i].Name && availableTargetClusters[i].Replicas > res[i].Replicas { 
        
           			availableTargetClusters[i].Replicas = res[i].Replicas 
        
           		} 
        
           	} 
        
           }

LavredisG · 2025-01-21T23:11:39Z

I didn't get your doubts, in fact, there can be multiple estimators working in the scheduler (General Cluster Modeling is a general estimator and scheduler-estimator is a accurate estimator).

I mean to say that since the scheduler-estimator was created to "fix" the problems that the general estimator had, is there any use case for the general estimator anymore?

LavredisG · 2025-01-24T01:59:15Z

Is it normal that the scheduler-estimator was working even without using the hack/deploy-scheduler-estimator.sh script? I mean to say that propagating a resource with either AvailableReplicas or Aggregated the distribution would be correct, as if the scheduler-estimator was already there. Is that expected?

chaosi-zju · 2025-01-24T02:47:02Z

I mean to say that since the scheduler-estimator was created to "fix" the problems that the general estimator had, is there any use case for the general estimator anymore?

Sorry, this got buried in my notifications.

the generic estimator is still useful, it serves as a default fallback estimator with lower overhead:

scheduler-estimator need extra components and gRPC access, which some users see as too costly and prefer not to install it.
when scheduler-estimator fail, a general fallback is available.

chaosi-zju · 2025-01-24T02:58:35Z

Is it normal that the scheduler-estimator was working even without using the hack/deploy-scheduler-estimator.sh script?

I still don't understand what you mean.

The script is just one way to install the component, we only care about whether the component exists, as there are many installation methods.

LavredisG · 2025-01-24T03:23:39Z

Sorry, this got buried in my notifications

No problem, all good!

Ok I will try to explain it better. I was propagating a deployment using either Aggregate or Weighted/Dynamic/AvailableReplicas for replica propagation, but without having deployed the scheduler-estimator with the script provided (I had set up karmada and joined member clusters but there were no scheduler-estimator pods running for the members). Both of these worked without the scheduler-estimator as they would if I had the scheduler estimator, meaning that they correctly assigned the pods to each cluster. Is the default ClusterResourceModel used in that case when we haven't deployed the estimator?

LavredisG added the kind/question Indicates an issue that is a support question. label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Scheduler Replica Propagation #6045

Custom Scheduler Replica Propagation #6045

LavredisG commented Jan 14, 2025

chaosi-zju commented Jan 14, 2025

LavredisG commented Jan 14, 2025

chaosi-zju commented Jan 15, 2025 •

edited

Loading

LavredisG commented Jan 20, 2025

chaosi-zju commented Jan 21, 2025

LavredisG commented Jan 21, 2025

LavredisG commented Jan 24, 2025

chaosi-zju commented Jan 24, 2025 •

edited

Loading

chaosi-zju commented Jan 24, 2025

LavredisG commented Jan 24, 2025 •

edited

Loading

Custom Scheduler Replica Propagation #6045

Custom Scheduler Replica Propagation #6045

Comments

LavredisG commented Jan 14, 2025

chaosi-zju commented Jan 14, 2025

LavredisG commented Jan 14, 2025

chaosi-zju commented Jan 15, 2025 • edited Loading

LavredisG commented Jan 20, 2025

chaosi-zju commented Jan 21, 2025

LavredisG commented Jan 21, 2025

LavredisG commented Jan 24, 2025

chaosi-zju commented Jan 24, 2025 • edited Loading

chaosi-zju commented Jan 24, 2025

LavredisG commented Jan 24, 2025 • edited Loading

chaosi-zju commented Jan 15, 2025 •

edited

Loading

chaosi-zju commented Jan 24, 2025 •

edited

Loading

LavredisG commented Jan 24, 2025 •

edited

Loading