This guide describes how the sharding mechanism implemented in this repository is evaluated and outlines the key results of the evaluation performed in the associated Master's thesis. Please refer to the thesis' evaluation section for more details.
The evaluation setup builds upon the Development and Testing Setup but adds a few more components. To demonstrate and evaluate the implemented sharding mechanisms using a fully functioning controller, a dedicated example operator was developed: the webhosting-operator. While the webhosting-operator is developed in the same repository, it only serves as an example.
When deploying the sharding components using make deploy
or make up
, the webhosting-operator is automatically deployed along with the other evaluation components.
Assuming you're in the repository's root directory, you can deploy the webhosting-operator using:
# deploy the webhosting-operator using pre-built images
make deploy SKAFFOLD_MODULE=webhosting-operator TAG=latest
# alternatively, build and deploy fresh images
make up SKAFFOLD_MODULE=webhosting-operator
To perform a quick test of the webhosting-operator, create some example Website
objects:
$ kubectl apply -k webhosting-operator/config/samples
...
$ kubectl -n project-foo get website,deploy,svc,ing -L shard.alpha.sharding.timebertt.dev/clusterring-ef3d63cd-webhosting-operator
NAME THEME PHASE AGE CLUSTERRING-EF3D63CD-WEBHOSTING-OPERATOR
website.webhosting.timebertt.dev/homepage exciting Ready 58s webhosting-operator-5d8d548cb9-qmwc7
website.webhosting.timebertt.dev/official lame Ready 58s webhosting-operator-5d8d548cb9-qq549
NAME READY UP-TO-DATE AVAILABLE AGE CLUSTERRING-EF3D63CD-WEBHOSTING-OPERATOR
deployment.apps/homepage-c1160b 1/1 1 1 57s webhosting-operator-5d8d548cb9-qmwc7
deployment.apps/official-97b754 1/1 1 1 57s webhosting-operator-5d8d548cb9-qq549
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE CLUSTERRING-EF3D63CD-WEBHOSTING-OPERATOR
service/homepage-c1160b ClusterIP 10.96.83.180 <none> 8080/TCP 58s webhosting-operator-5d8d548cb9-qmwc7
service/official-97b754 ClusterIP 10.96.193.214 <none> 8080/TCP 58s webhosting-operator-5d8d548cb9-qq549
NAME CLASS HOSTS ADDRESS PORTS AGE CLUSTERRING-EF3D63CD-WEBHOSTING-OPERATOR
ingress.networking.k8s.io/homepage-c1160b nginx * 80 58s webhosting-operator-5d8d548cb9-qmwc7
ingress.networking.k8s.io/official-97b754 nginx * 80 58s webhosting-operator-5d8d548cb9-qq549
You can now visit the created websites at http://localhost:8088/project-foo/homepage and http://localhost:8088/project-foo/official. You can also visit your local webhosting dashboard after forwarding the Grafana port:
kubectl -n monitoring port-forward svc/grafana 3000
This dashboard uses metrics exported by webhosting-operator about its API objects, i.e., kube_website_*
and kube_theme_*
.
There is also a dashboard about the sharding of websites.
In addition to creating the preconfigured websites, you can also generate some more random websites using the samples-generator:
# create a random number of websites per project namespace (up to 50 each)
$ go run ./webhosting-operator/cmd/samples-generator
created 32 Websites in project "project-foo"
The experiment tool allows executing different scenarios for load testing the webhosting-operator, which are used for evaluating the sharding mechanism:
$ go run ./webhosting-operator/cmd/experiment -h
Usage:
experiment [command]
Available Scenarios
basic Basic load test scenario (15m) that creates roughly 9k websites
scale-out Scenario for testing scale-out with high churn rate
...
A load test scenario can be executed using one of these commands:
# run the basic scenario from your development machine (not recommended)
go run ./cmd/experiment basic
# build the experiment image and run the basic scenario as a Job on the cluster
make up SKAFFOLD_MODULE=experiment EXPERIMENT_SCENARIO=basic
# use a pre-built experiment image to run the basic scenario as a Job on the cluster
make deploy SKAFFOLD_MODULE=experiment EXPERIMENT_SCENARIO=basic TAG=latest
All scenarios put load on webhosting-operator by creating and mutating a large amount of Website
objects.
However, creating soo many Websites
would waste immense compute power just to run thousands of dummy websites.
Hence, webhosting-operator creates Deployments
of Websites
in load tests with spec.replicas=0
.
It also doesn't expose Websites
created in load tests via Ingress
objects by setting spec.ingressClassName=fake
.
Otherwise, this would overload the ingress controller, which is not what the experiment is actually supposed to load test.
When running load test experiments on the cluster, a ServiceMonitor
is created to instruct prometheus to scrape experiment
.
As the tool is based on controller-runtime as well, the controller-runtime metrics can be used for visualizing the load test scenario and verifying that the tool is able to generate the desired load.
As a local kind cluster cannot handle such high load, a remote cluster is used to perform the load test experiments.
For this, a Gardener installation on STACKIT is used to create a cluster based on the sample manifest.
external-dns is used for publicly exposing the monitoring and continuous profiling endpoints, as well as Websites
created outside of load test experiments.
# gardenctl target --garden ...
kubectl apply -f hack/config/shoot.yaml
# gardenctl target --shoot ...
kubectl apply --server-side -k hack/config/external-dns
kubectl -n external-dns create secret generic google-clouddns-timebertt-dev --from-literal project=$PROJECT_NAME --from-file service-account.json=$SERVICE_ACCOUNT_FILE
# gardenctl target --control-plane
kubectl apply --server-side -k hack/config/policy/controlplane
In addition to the described components, kyverno is deployed to the cluster itself (shoot cluster) and to the control plane (seed cluster).
In the cluster itself, kyverno policies are used for scheduling the sharder and webhosting-operator to the dedicated sharding
worker pool and experiment to the dedicated experiment
worker pool.
This makes sure that these components run on machines isolated from other system components and don't content for compute resources during load tests.
Furthermore, kyverno policies are added to the control plane to ensure a static size of etcd, kube-apiserver, and kube-controller-manager (requests=limits for guaranteed resources, disable vertical autoscaling, 4 replicas of kube-apiserver to disable horizontal autoscaling) and schedule them to a dedicated worker pool using a non-overcommit flavor with more CPU cores per machine. This is done to make load test experiments more stable and their results more reproducible.
After executing a load test experiment, the measure tool is used for retrieving the key metrics from Prometheus. It takes a configurable set of measurements in the form of Prometheus queries and stores them in CSV-formatted files for further analysis (with numpy/pandas) and visualization (with matplotlib). Please see the results directory in the Master's thesis' repository for the exact measurements taken.
The scale of the controller setup is measured in two dimensions:
- The number of API objects that the controller watches and reconciles.
- The churn rate of API objects, i.e., the rate of object creations, updates, and deletions.
To consider a controller setup as performing adequately, the following SLOs need to be satisfied:
- The time of enqueuing object keys for reconciliation for every controller, measured as the 99th percentile per cluster-day, is at maximum 1 second.
- The latency of realizing the desired state of objects for every controller, excluding reconciliation time of controlled objects, until observed by a watch request, measured as the 99th percentile per cluster-day, is at maximum x, where x depends on the controller.
In case of the Website
controller, 5 is chosen for x.
queries:
- name: latency-queue # SLO 1
type: instant
slo: 1
query: |
histogram_quantile(0.99, sum by (le) (rate(
workqueue_queue_duration_seconds_bucket{
job="webhosting-operator", name="website"
}[$__range]
)))
- name: latency-reconciliation # SLO 2
type: instant
slo: 5
query: |
histogram_quantile(0.99, sum by (le) (rate(
experiment_website_reconciliation_duration_seconds_bucket{
job="experiment"
}[$__range]
)))
The following graphs show the generated load and compare the resulting CPU, memory, and network usage of the components in three different setups when running the basic
experiment scenario (~9k websites created over 15m):
- external sharder: 3 webhosting-operator pods (shards) + 2 sharder pods (the new approach implemented in this repository, second iteration for the Master's thesis)
- internal sharder: 3 webhosting-operator pods (3 shards, 1 acts as the sharder) (the old approach, first iteration for the study project)
- singleton: 1 webhosting-operator pod (traditional leader election setup without sharding)
The new external sharding approach proves to scale best. The individual shards consume about a third of the singleton controller's usage (close to optimum). Also, the sharder pods consume a low static amount of resources. Most importantly, the sharder's resource usage is independent of the number of sharded objects.
To evaluate the horizontal scalability of the sharding mechanism (external sharder), the maximum load capacity is determined for different numbers of instances (1, 2, 3, 4, 5). While the load increases, cumulative SLIs from the start of the experiment are calculated. When the cumulative SLI grows above the SLO, the current count and churn rate are the maximum load capacity. As shown in the last plot, the system's capacity increases almost linearly with the number of added instances.