Skip to content

Commit

Permalink
ArmoniK Load Balancer
Browse files Browse the repository at this point in the history
  • Loading branch information
lemaitre-aneo committed Sep 30, 2024
1 parent f953c58 commit aa95387
Showing 1 changed file with 185 additions and 0 deletions.
185 changes: 185 additions & 0 deletions AEP/aep-00004.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# AEP 4: ArmoniK Load Balancer

| |ArmoniK Enhancement Proposal|
---: |:---
**AEP** | 4
**Title** | ArmoniK Load Balancer
**Author** | Florian Lemaitre <<[email protected]>>
**Status** | Draft
**Type** | Standard
**Creation Date** | 2024-04-12

# Abstract

This AEP proposes an ArmoniK native Load Balancer, which can be used to scale ArmoniK clusters.

# Motivation

ArmoniK is built around a centralized database, queue, and object storage, making it hard to scale.
While it is fairly easy to have a scalable (distributed) object storage, it is much more complex to setup a scalable database or queue.

# Rationale

A simple alternative is to have multiple isolated ArmoniK clusters, with their own database, queue, and object storage.
To make such a design transparent and easy to work with, an ArmoniK Load Balancer would be needed.

Such a Load Balancer would understand the ArmoniK concepts and forward user requests based on ArmoniK data, such as the session ID.

# Specification

## Solution overview

An ArmoniK Load Balancer would be an external service that would serve ArmoniK API in gRPC and redirect them to a cluster.
Upon Session creation, a single cluster is assigned to the session, chosen by the implementation.
Afterward, all requests targeting the session will be redirected to their assigned cluster.
The assigned cluster cannot be changed after the session has been created.
This binding enables to easily keep track of the tasks and data within a session, even if more tasks and data are created from the workers.

No additional API is required, and few adjustments are required infrastructure-wise.
The ingress in front of the actual cluster should accept authentication headers from the Load Balancer itself, or the Load Balancer is an authenticated user of the cluster that is able to impersonate any user of the cluster.

The Load Balancer could scale to many servers without issues.

## Implementation

The implementation of such a Load Balancer would consist in a gRPC server and client.
The server would take ArmoniK requests from the user, and forward them to the right cluster using ArmoniK API.
It would forward the response back to the user.

In particular, it is transparent from both ends, client and control plane.
No specific API is required.

The choice of a cluster can be random, or follow some business rules like selecting the cluster with the least load, the cheapest one, or the cluster with the fastest hardware.
It could also be possible to select a specific cluster when creating the session using the default `TaskOptions` of the session and a known field if required.

### Caching and cluster discovery

When a new session is created, the link between the session ID and the cluster is recorded either in a database or in a cache (possibly within the process itself).
This enable the Load Balancer to directly know to which cluster to redirect future requests for this session.
There is no consideration of cache coherency between Load Balancers or cache expiration as the binding to a cluster can never change.

For the RPCs that do not have the session ID in their requests (like `GetTaskRequest`), a discovery among the clusters is required.
Upon such request, the Load Balancer would forward the request to all clusters, and return only the response of the cluster containing the requested task or result.
The link between the task/result and the cluster can also be cached to reduce the number of discoveries.

To further reduce the number of discoveries, the link between tasks/results and clusters can be cached upon task/result creation when the cluster actually returns the new ID to the Load Balancer.
By adding the session ID to all the RPC, we would be able to remove all discoveries during normal execution, even when workers create new tasks and results.

This principle of cluster discovery can also be used to handle Load Balancer restarts, and make them stateless.
On startup, the cache will be empty, and all requests would trigger a discovery, even if the session ID is included in the request.
Using a database instead of (or in addition to) a cache would avoid too many discoveries on startups, but could limit the scaling to what the database can handle.

It would be possible to enable communications between Load-Balancers in order to reduce the number of discoveries altogether, at the price of a much more complicated Load Balancer.

### Authentication

There are many ways to handle authentication in the presence of load balancing.
Whatever solution is chosen, the TLS connection must be terminated at the Load Balancer to let it interpret the queries.

One possibility would be to perform authentication and authorization at the Load Balancer level and forward the request to the cluster only if the user is permitted to do the action.
Authentication would be disabled on the clusters, or at least would allow the Load Balancer to perform any action.
This would greatly complexify the Load Balancer itself.

Another possibility would be to let the Load Balancer impersonate users to the cluster using the standard impersonate feature of the authentication system.
This would force the Load Balancer to have a dedicated user on the clusters with the permission to impersonate any user.
As it is currently not possible to chain impersonates, it would also forbid end-users to impersonate themselves, even if they have the permission to do it.

Last possibility would be to make the Load Balancer forward the identity headers to the clusters, in the same way the nginx ingress transmits the identity to the control plane currently.
Extra configuration at the cluster ingress would be required to allow the Load Balancer to set those sensitive headers.
This last solution is the preferred one as it is simple to implement, secure, and featureful.

### Partitions

As a single session cannot span multiple clusters, all the partitions requested by the session must exist on the target cluster.
The Load Balancer would be able to select a cluster that supports all the partitions requested, if any.
It should avoid selecting a cluster that does not support a partition, and if no cluster supports all the partitions, the Load Balancer would raise an error.

As an example, consider a cluster A with partitions p1 and p2, and a cluster B with partitions p2 and p3.
If a session needs to target p2, it can go on either A or B.
If a session targets p1 and p2, it can only go on A.
If a session targets p3, it can only go on B.
If a session targets p1 and p3, it cannot go anywhere, and the Load Balancer raises an error.
If a session targets p4, the Load Balancer raises an error.

### Cluster health

If there is an error when the session is created, the Load Balancer can retry to create the session on another cluster or raise an error if no cluster is available.
Once a session is bound to a cluster, all requests for this session will raise an error if the cluster becomes unreachable.

For monitoring requests that span multiple clusters (if allowed), it is not clear how to indicate that one of the clusters is unresponsive, while still returning the response from other clusters.
The simplest solution would be to silently ignore cluster failures and return the response from all the responsive clusters without any indication of cluster failure.

### Monitoring

The presence of Load Balancers would make the monitoring harder because of the listing across clusters.

The recommended way would be to enforce listing tasks and results on sessions and adapt the GUI to match that requirement.
To have a global overview, it would be necessary to add some session-wide statistics like the number of tasks completed and in error to enable the view at the session level, without the need to query all the clusters.
The aggregation could be performed at regular interval on all opened sessions to reduce the number of sessions to query, as a closed session cannot have more tasks added.

### Dynamic clusters

The design imposes no constraint on the addition of new clusters during operation.
It would be a simple matter of allowing a new cluster to be chosen at session creation.

The removal of a cluster is trickier.
This would be possible only if all the sessions of the cluster are closed to avoid computation loss.

If the monitoring of removed clusters is required, a dedicated database would be needed to store all the metadata of the removed clusters.
When a RPC targets a removed cluster, this backup database could be requested instead.

## Further considerations

### Global monitoring

Imposing to have a monitoring by session is simple enough, but is not really useful.
A better approach would be to enable the monitoring spanning all clusters, and allow the listing of the tasks from all clusters.

The main issue for that is the paginated listing of tasks or results.
Indeed, it is not possible to know what pages of which cluster is required to get the n-th page of the global view.

A method to resolve this point would be the introduction of a gRPC stream for listing.
It would then be possible for the Load Balancer to request all the clusters and merge their streams using the specified sort.

It is not clear how this could be integrated into the Web GUI where the pages are mandatory for UX purposes.
Either the GUI could materialize the whole stream when getting the pages, or a dedicated service could perform this materialization task.
The latter approach would be more complex, especially to avoid system overloads in the presence of many clients, but would be entirely compatible with the current API.

### On-demand clusters

An extension of the Load Balancer could be to provision clusters on the fly.
When a session is created, the Load Balancer could deploy a fresh cluster that will be destroyed when the session is deleted.

The design of such a solution is more complex and could be considered later on.

### Cluster field

An alternative to the session ID field being used to load balance the request would be to add a cluster field to all the rpc which can be used to load balance the requests.
It could also be used for monitoring instead of imposing a filter on the session.

This change would be more intrusive API-wise.

# Rejected idea

## Deterministic binding

The first idea for a stateless Load Balancer was to avoid discovery altogether and have a deterministic binding between session and cluster.

The session ID would have been hashed to select which cluster it goes to, in a similar way to a (distributed) hash table.
With such a design, the Load Balancer would know without any state to which cluster a session is bound.

There are mainly four problems with such an approach.
First, the session ID is known only after it has been created on a cluster.
It would have been necessary to introduce a new RPC used only between the Load Balancer and the cluster to create a new session with a given UUID.

Second, currently, there is some RPCs that do not have the session ID in their requests.
Without discovery nor database, it is impossible to know which cluster to request.

Third, it would have been impossible to dynamically choose the cluster according to some criteria like the available partitions, the current load of the cluster, or the price of the workers.

And fourth, adding clusters later on would have been very difficult to implement.
It would have needed rebalancing, but it is not clear how we could rebalance on-going sessions and migrate them on another cluster.

# Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

0 comments on commit aa95387

Please sign in to comment.