Skip to content

Commit

Permalink
docs: Finish docs on all routers
Browse files Browse the repository at this point in the history
  • Loading branch information
sbernauer committed Jan 16, 2024
1 parent f6d92a4 commit 72a4a70
Show file tree
Hide file tree
Showing 8 changed files with 116 additions and 7 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ A highly available load balancer with support for queueing, routing and auto-sca
* [Design](./docs/design.md)
* [Example setups](./docs/example-setups.md)
* [Routing](./docs/routing/index.md)
* [TrinoRoutingGroupHeaderRouter](./docs/routing/TrinoRoutingGroupHeaderRouter.md)
* [PythonScriptRouter](./docs/routing/PythonScriptRouter.md)
* [ExplainCostsRouter](./docs/routing/ExplainCostsRouter.md)
* [ClientTagsRouter](./docs/routing/ClientTagsRouter.md)
* [Persistence](./docs/persistence/index.md)
* [In-memory](./docs/persistence/in-memory.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ In case any router returns a cluster group that does not exist the decision will

Currently the following routers are implemented:

1. TrinoRoutingGroupHeaderRouter: This router looks at a specific HTTP header (`X-Trino-Routing-Group` by default) and returns the container cluster group in case the header is set.
1. [TrinoRoutingGroupHeaderRouter](./routing/TrinoRoutingGroupHeaderRouter.md): This router looks at a specific HTTP header (`X-Trino-Routing-Group` by default) and returns the container cluster group in case the header is set.
2. [PythonScriptRouter](./routing/PythonScriptRouter.md): A user-configured python script is called and can do arbitrary calculations in Python the determine the target cluster group by looking at the query and headers passed.
This is the most flexible way of defining routing rules.
3. ExplainCostsRouter: This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
3. [ExplainCostsRouter](./routing/ExplainCostsRouter.md): This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
Trino will respond with an resource estimation the query will consume.
Please note that this functional heavily depends on [Table statistics](https://trino.io/docs/current/optimizer/statistics.html) being present for the access tables to get meaningful estimations.
4. [ClientTagsRouter](./routing/ClientTagsRouter.md): Route queries based on client tags send in the `X-Trino-Client-Tags` header.
Expand Down
6 changes: 4 additions & 2 deletions docs/routing/ClientTagsRouter.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
This router routes queries based on client tags send in the `X-Trino-Client-Tags` header.
It supports routing a query based on the presence of one tag from a given list OR on the presence of all tags in the list

## One of a list of tags
## Configuration

### One of a list of tags

Let's imagine you want all queries with the tag `etl`, `etl-airflow` **or** `etc-special` to end up the the cluster group `etl`.

Expand All @@ -16,7 +18,7 @@ routers:
trinoClusterGroup: etl
```
## All of a list of tags
### All of a list of tags
A different scenario is that you want to route all queries that have all the required tags, let's say they need the tag `etl` and `system=foo`, as this system executes very very large queries.

Expand Down
46 changes: 46 additions & 0 deletions docs/routing/ExplainCostsRouter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# ExplainCostsRouter

This router executes an `explain {query}` [EXPLAIN](https://trino.io/docs/current/sql/explain.html?highlight=explain) query for every incoming query.
Trino will respond with an resource estimation the query will consume.
Please note that this functional heavily depends on [Table statistics](https://trino.io/docs/current/optimizer/statistics.html) being present for the access tables to get meaningful estimations.

For this to work trino-lb executes `explain (format json) {query}` and sums up all the resource estimations of the children stages.
This is a very simplistic model and will likely be improved in the future - we are happy about any suggestions on how the estimates should be calculated the best [in the tracking issue](https://github.com/stackabletech/trino-lb/issues/11).

After trino-lb got the query estimation, it walks a list of resource buckets you can specify top to bottom and picks the first one that fulfills all resource requirements (CPU, memory, Network traffic etc.).
If no bucket matches the router will not a make a decision and let the routers further down the chain decide.

> [!WARN]
> Please keep in mind that trino-lb will determine the target cluster group for every incoming query instantly (otherwise it does not know if it should queued or hand over the query).
> In case a Trino client submits many queries at once this will result in the same number of `explain` queries on the Trino used for the query estimations.
> There is [a issue to address this](https://github.com/stackabletech/trino-lb/issues/10), however until this is resolved it is recommended to have the `ExplainCostsRouter` near the end of the chain and try to classify the queries with a different router.
> However, this is only important if you have more queries/s than a Trino cluster can run `explain` queries for (which hopefully is pretty much).
# Configuration

With this words of caution, lets jump into the configuration.

You need to specify the cluster that executes the `explain` queries. Currently only password based authentication is supported.

```yaml
routers:
- explainCosts:
trinoClusterToRunExplainQuery:
endpoint: https://trino-coordinator-default.default.svc.cluster.local:8443
ignoreCert: true # optional, defaults to false
username: admin
password: adminadmin
targets:
- cpuCost: 5E+9
memoryCost: 5E9 # 5GB
networkCost: 10E9 # 10GB
outputRowCount: 1E6
outputSizeInBytes: 1E9 # 1GB
trinoClusterGroup: s
- cpuCost: 5E+12
memoryCost: 5E12 # 5TB
networkCost: 5E12 # 5TB
outputRowCount: 1E9
outputSizeInBytes: 5E12 # 5TB
trinoClusterGroup: m
```
21 changes: 21 additions & 0 deletions docs/routing/TrinoRoutingGroupHeaderRouter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# TrinoRoutingGroupHeaderRouter

This very simple router looks at the `X-Trino-Routing-Group` header (name of the header is configurable) and send the query to whatever cluster group was put in the header (e.g. `X-Trino-Routing-Group: foo` would go to the cluster group `foo`).
In case the specified cluster group does not exist the router will not make a decision and therefore let the next router in the chain decide.

## Configuration

The configuration of the Router is very simple, as it does not need any properties:

```yaml
routers:
- trinoRoutingGroupHeader: {}
```
Additionally you can configure the name of the HTTP header in case it differs from `X-Trino-Routing-Group`:

```yaml
routers:
- trinoRoutingGroupHeader:
header: X-My-Custom-Routing-Header # optional, defaults to X-Trino-Routing-Group
```
4 changes: 2 additions & 2 deletions docs/routing/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ E.g. a router based on [client tags](https://trino.io/docs/current/develop/clien

Currently the following routers are implemented:

1. TrinoRoutingGroupHeaderRouter
1. [TrinoRoutingGroupHeaderRouter](./TrinoRoutingGroupHeaderRouter.md)
2. [PythonScriptRouter](./PythonScriptRouter.md)
3. ExplainCostsRouter
3. [ExplainCostsRouter](./ExplainCostsRouter.md)
4. [ClientTagsRouter](./ClientTagsRouter.md)
4 changes: 4 additions & 0 deletions docs/scaling/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ Scaling implementation therefore only need to implement functions to turn cluste
Routing is implemented in a generic fashion by exposing the trait `trino_lb::scaling::ScalerImplementation` (think of like an interface).
Different scaling engines can be implemented using this trait, please feel free to open an issue or pull request!

You can additionally configure the minimum number of clusters per cluster group that should be running per any given time interval.
This allows you to e.g. have a higher minimum number of clusters during work days.
An alternative use-case is to scale up the `etl` cluster group just before 02:00 at night, as a client will submit many queries at this given timestamp and scale down at 03:00 again.

Currently the following autoscalers are implemented:

1. [Stackable](./stackable.md)
36 changes: 35 additions & 1 deletion docs/scaling/stackable.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,38 @@ clusterAutoscaler:
trino-m-2:
name: trino-m-2
namespace: default
```
```

## Kubernetes requirements

trino-lb needs access to the Kubernetes cluster the Stackable Data platform is running on.
The easiest way to achieve this is by running trino-lb within Kubernetes, e.g. as a Deployment with multiple replicas.

Also the ServiceAccount trino-lb is running with needs at least the following permissions to inspect and start/stop running Trino clusters:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ .Release.Name }}
labels:
app.kubernetes.io/name: trino-lb
app.kubernetes.io/instance: {{ .Release.Name }}
rules:
- apiGroups:
- trino.stackable.tech
resources:
- trinoclusters
verbs:
- get
- list
- watch
- patch
- apiGroups:
- trino.stackable.tech
resources:
- trinoclusters/status
verbs:
- get
```

0 comments on commit 72a4a70

Please sign in to comment.