docs: Finish docs on all routers

stackabletech · Jan 16, 2024 · 72a4a70 · 72a4a70
1 parent f6d92a4
commit 72a4a70
Show file tree

Hide file tree

Showing 8 changed files with 116 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,9 @@ A highly available load balancer with support for queueing, routing and auto-sca
 * [Design](./docs/design.md)
 * [Example setups](./docs/example-setups.md)
 * [Routing](./docs/routing/index.md)
+  * [TrinoRoutingGroupHeaderRouter](./docs/routing/TrinoRoutingGroupHeaderRouter.md)
   * [PythonScriptRouter](./docs/routing/PythonScriptRouter.md)
+  * [ExplainCostsRouter](./docs/routing/ExplainCostsRouter.md)
   * [ClientTagsRouter](./docs/routing/ClientTagsRouter.md)
 * [Persistence](./docs/persistence/index.md)
   * [In-memory](./docs/persistence/in-memory.md)

diff --git a/docs/design.md b/docs/design.md
@@ -46,10 +46,10 @@ In case any router returns a cluster group that does not exist the decision will
 
 Currently the following routers are implemented:
 
-1. TrinoRoutingGroupHeaderRouter: This router looks at a specific HTTP header (`X-Trino-Routing-Group` by default) and returns the container cluster group in case the header is set.
+1. [TrinoRoutingGroupHeaderRouter](./routing/TrinoRoutingGroupHeaderRouter.md): This router looks at a specific HTTP header (`X-Trino-Routing-Group` by default) and returns the container cluster group in case the header is set.
 2. [PythonScriptRouter](./routing/PythonScriptRouter.md): A user-configured python script is called and can do arbitrary calculations in Python the determine the target cluster group by looking at the query and headers passed.
 This is the most flexible way of defining routing rules.
-3. ExplainCostsRouter: This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
+3. [ExplainCostsRouter](./routing/ExplainCostsRouter.md): This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
 Trino will respond with an resource estimation the query will consume.
 Please note that this functional heavily depends on [Table statistics](https://trino.io/docs/current/optimizer/statistics.html) being present for the access tables to get meaningful estimations.
 4. [ClientTagsRouter](./routing/ClientTagsRouter.md): Route queries based on client tags send in the `X-Trino-Client-Tags` header.

diff --git a/docs/routing/ClientTagsRouter.md b/docs/routing/ClientTagsRouter.md
@@ -3,7 +3,9 @@
 This router routes queries based on client tags send in the `X-Trino-Client-Tags` header.
 It supports routing a query based on the presence of one tag from a given list OR on the presence of all tags in the list
 
-## One of a list of tags
+## Configuration
+
+### One of a list of tags
 
 Let's imagine you want all queries with the tag `etl`, `etl-airflow` **or** `etc-special` to end up the the cluster group `etl`.
 
@@ -16,7 +18,7 @@ routers:
       trinoClusterGroup: etl
 ```
 
-## All of a list of tags
+### All of a list of tags
 
 A different scenario is that you want to route all queries that have all the required tags, let's say they need the tag `etl` and `system=foo`, as this system executes very very large queries.
 

diff --git a/docs/routing/ExplainCostsRouter.md b/docs/routing/ExplainCostsRouter.md
@@ -0,0 +1,46 @@
+# ExplainCostsRouter
+
+This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
+Trino will respond with an resource estimation the query will consume.
+Please note that this functional heavily depends on [Table statistics](https://trino.io/docs/current/optimizer/statistics.html) being present for the access tables to get meaningful estimations.
+
+For this to work trino-lb executes `explain (format json) {query}` and sums up all the resource estimations of the children stages.
+This is a very simplistic model and will likely be improved in the future - we are happy about any suggestions on how the estimates should be calculated the best [in the tracking issue](https://github.com/stackabletech/trino-lb/issues/11).
+
+After trino-lb got the query estimation, it walks a list of resource buckets you can specify top to bottom and picks the first one that fulfills all resource requirements (CPU, memory, Network traffic etc.).
+If no bucket matches the router will not a make a decision and let the routers further down the chain decide.
+
+> [!WARN]
+> Please keep in mind that trino-lb will determine the target cluster group for every incoming query instantly (otherwise it does not know if it should queued or hand over the query).
+> In case a Trino client submits many queries at once this will result in the same number of `explain` queries on the Trino used for the query estimations.
+> There is [a issue to address this](https://github.com/stackabletech/trino-lb/issues/10), however until this is resolved it is recommended to have the `ExplainCostsRouter` near the end of the chain and try to classify the queries with a different router.
+> However, this is only important if you have more queries/s than a Trino cluster can run `explain` queries for (which hopefully is pretty much).
+
+# Configuration
+
+With this words of caution, lets jump into the configuration.
+
+You need to specify the cluster that executes the `explain` queries. Currently only password based authentication is supported.
+
+```yaml
+routers:
+  - explainCosts:
+      trinoClusterToRunExplainQuery:
+        endpoint: https://trino-coordinator-default.default.svc.cluster.local:8443
+        ignoreCert: true # optional, defaults to false
+        username: admin
+        password: adminadmin
+      targets:
+        - cpuCost: 5E+9
+          memoryCost: 5E9 # 5GB
+          networkCost: 10E9 # 10GB
+          outputRowCount: 1E6
+          outputSizeInBytes: 1E9 # 1GB
+          trinoClusterGroup: s
+        - cpuCost: 5E+12
+          memoryCost: 5E12 # 5TB
+          networkCost: 5E12 # 5TB
+          outputRowCount: 1E9
+          outputSizeInBytes: 5E12 # 5TB
+          trinoClusterGroup: m
+```
diff --git a/docs/routing/TrinoRoutingGroupHeaderRouter.md b/docs/routing/TrinoRoutingGroupHeaderRouter.md
@@ -0,0 +1,21 @@
+# TrinoRoutingGroupHeaderRouter
+
+This very simple router looks at the `X-Trino-Routing-Group` header (name of the header is configurable) and send the query to whatever cluster group was put in the header (e.g. `X-Trino-Routing-Group: foo` would go to the cluster group `foo`).
+In case the specified cluster group does not exist the router will not make a decision and therefore let the next router in the chain decide.
+
+## Configuration
+
+The configuration of the Router is very simple, as it does not need any properties:
+
+```yaml
+routers:
+  - trinoRoutingGroupHeader: {}
+```
+
+Additionally you can configure the name of the HTTP header in case it differs from `X-Trino-Routing-Group`:
+
+```yaml
+routers:
+  - trinoRoutingGroupHeader:
+      header: X-My-Custom-Routing-Header # optional, defaults to X-Trino-Routing-Group
+```
diff --git a/docs/routing/index.md b/docs/routing/index.md
@@ -7,7 +7,7 @@ E.g. a router based on [client tags](https://trino.io/docs/current/develop/clien
 
 Currently the following routers are implemented:
 
-1. TrinoRoutingGroupHeaderRouter
+1. [TrinoRoutingGroupHeaderRouter](./TrinoRoutingGroupHeaderRouter.md)
 2. [PythonScriptRouter](./PythonScriptRouter.md)
-3. ExplainCostsRouter
+3. [ExplainCostsRouter](./ExplainCostsRouter.md)
 4. [ClientTagsRouter](./ClientTagsRouter.md)
diff --git a/docs/scaling/index.md b/docs/scaling/index.md
@@ -10,6 +10,10 @@ Scaling implementation therefore only need to implement functions to turn cluste
 Routing is implemented in a generic fashion by exposing the trait `trino_lb::scaling::ScalerImplementation` (think of like an interface).
 Different scaling engines can be implemented using this trait, please feel free to open an issue or pull request!
 
+You can additionally configure the minimum number of clusters per cluster group that should be running per any given time interval.
+This allows you to e.g. have a higher minimum number of clusters during work days.
+An alternative use-case is to scale up the `etl` cluster group just before 02:00 at night, as a client will submit many queries at this given timestamp and scale down at 03:00 again.
+
 Currently the following autoscalers are implemented:
 
 1. [Stackable](./stackable.md)
diff --git a/docs/scaling/stackable.md b/docs/scaling/stackable.md
@@ -54,4 +54,38 @@ clusterAutoscaler:
       trino-m-2:
         name: trino-m-2
         namespace: default
-```
+```
+
+## Kubernetes requirements
+
+trino-lb needs access to the Kubernetes cluster the Stackable Data platform is running on.
+The easiest way to achieve this is by running trino-lb within Kubernetes, e.g. as a Deployment with multiple replicas.
+
+Also the ServiceAccount trino-lb is running with needs at least the following permissions to inspect and start/stop running Trino clusters:
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: {{ .Release.Name }}
+  labels:
+    app.kubernetes.io/name: trino-lb
+    app.kubernetes.io/instance: {{ .Release.Name }}
+rules:
+  - apiGroups:
+      - trino.stackable.tech
+    resources:
+      - trinoclusters
+    verbs:
+      - get
+      - list
+      - watch
+      - patch
+  - apiGroups:
+      - trino.stackable.tech
+    resources:
+      - trinoclusters/status
+    verbs:
+      - get
+
+```