Elements Service Load Balancer #2552

luoluoyuyu · 2024-03-13T16:11:14Z

luoluoyuyu
Mar 13, 2024
Collaborator

Proposal: Elements Service Load Balancer

Motivation

As Apache StreamPipes serves as an IoT data stream processing system, providing robust computational capabilities is necessary. The extensions service primarily handling data stream processing may require scalable support to optimize its performance. Therefore, I propose adding an extensible load balancer to StreamPipes.

Goals

User-Facing Goals

Logic: Equitably balance the utilization of each extensions service while effectively reducing latency during the load balancing process. The load balancing process should be triggered by the core without user intervention.
Affinity: Provide affinity of extensions services for pipelines to better support edge computing.

Logging/Metrics

Add meaningful logs and metrics for all major load balancing events to facilitate system monitoring and troubleshooting.

Configuration

Minimize configuration requirements while providing clear configuration instructions.
Offer configuration options to set custom load balancing policies.

Internal Implementation Goals

Logic

Three load balancing logics for pipelines will be implemented in the core, with decisions made by the core.
Extensions services need to provide metrics to the core, which will collect and store them in metadata storage.
Develop a new module to implement the load balancer, ensuring effective communication with the core.
Support multiple algorithms for pipeline allocation:
- Weighted random algorithm: Randomly select extensions services based on weights for pipeline allocation.
- Minimum resource occupancy: Choose extensions services with the lowest resource usage for pipeline allocation.
Pipeline migration: When the load on a pipeline exceeds a set threshold (MaxMsgRate, MaxBandwidthMbytes), split the pipeline to share the load.
Pipeline offloading: When the load on an extensions service exceeds the cluster's average load, allocate some pipelines to other extensions services.

Logging/Metrics

Add meaningful logs and metrics for all major load balancing events.

Configuration

Dynamically adjust internal configuration thresholds based on load data to ensure the system can make appropriate adjustments in real-time, enhancing performance and stability.

Testing

Add tests for the load balancer.

Load Balancer Implementation

Pipeline Allocation

Pipelines are allocated based on continuous pipelines (which can also be continuous pipeline branches, hereafter referred to as pipelines). If future optimizations for message transmission are required, such as better adaptation to memory as a medium for transmission, it should also better support the affinity feature.

Affinity

To better support the characteristics of edge computing, labels can be set for extensions services and pipelines. Priority is given to allocate pipelines to extensions services that match the pipeline label, which may also cause resource skewness due to affinity.

For example, Service A sets label [Asia], Service sets label [North America], and the pipeline sets label [Asia], then the pipeline will be allocated to Service A first.

Data Model

ServiceLoadData

For example {cpu, memory, io, msgIn/Out, ...}
Stored in metadata storage.

PipelineLoadData

{PipelineName, msgIn/Out, ...}
Stored in metadata storage.

Data Read/Write Flow

ServiceLoadData

Write:
Core periodically reads ServiceLoadData and puts it into metadata storage.

Read:
When load balancing needs to be determined, current data and historical data need to be read.

PipelineLoadData

Write:
Core periodically collects PipelineLoadData and puts it into metadata storage.

Read:
When pipeline offloading needs to be determined, current data needs to be read.

Pipeline Allocation

Pipeline allocation needs to consider how to select available extension services because each extension service may support different elements. Therefore, when faced with this issue, attention should be paid to how to separate pipelines into branches so that they can be more effectively allocated to extension services. Thus, pipeline segmentation is required. It is advisable for each extension services to have an equal number of runnable elements, facilitating allocation on a per-pipeline basis.

Pipeline Segmentation

Deep-first traversal is adopted. If the number of available element services decreases when traversing to the next element, then this pipeline branch is treated as a unit for allocation.

As shown in the above diagram, the adapter is allocated as a separate pipeline. Below is the pipeline segmentation, with [A,B] as one pipeline branch and [C] as another pipeline branch for allocation.
[A,B] pipeline can allocate 2 extension element services, but when element C is added, the pipeline can only allocate one extension service, reducing the available extension element services. Therefore, it is segmented into allocation units [A,B], [C].

Allocation Algorithms

Pipeline branches need to be allocated to extension services, and the following algorithms can be provided.

Weighted Random Algorithm（A-Res Algorithm）: Utilizes the reservoir sampling concept to solve the problem without prior knowledge of the sample set size or weights. It traverses the sample set once to obtain results, with a space complexity of O(m), proportional to the size of the result set.

Algorithm: A-Res
Input: sample sequence V, the length is unknown, the weight of the i-th sample vi is wi
Output: a result set R of length m
 
foreach vi in V (i = 1, 2, ...):
    ki = rand(0, 1) ^ (1 / wi)
    if i <= m:
       (vi, ki) join R
    else:
       (vt, kt) = min k ∈ R // Select the sample with the smallest k in R
       if ki > kt:
          (vi, ki) replaces (vt, kt). z

Minimum Resource Occupancy: Selects the service with the lowest resource usage rate. The calculation method for resource usage rate.

usage =  
    max (
    %cpu * cpuWeight
    %memory * memoryWeight,
    %bandwidthIn * bandwidthInWeight,
    %bandwidthOut * bandwidthOutWeight) / 100;

    usage = x * prevUsage + (1 - x) * usage

Affinity

For services with affinity, the default load balancing strategy is used.

Pipeline Separation

Calculate the throughput rate of the pipeline. When the throughput rate of all elements exceeds a threshold, the pipeline is separated. Priority is given to separating pipeline branches with no affinity.

Separation Algorithm

The pipeline is divided into two branches with equal throughput or message rates, and one of the branches is further allocated pipelines.

Pipeline Migrator

Calculate the load of services and provide three strategies. However, due to disparities in each extensions service, such as inconsistent runnable elements or unequal physical resources, the specific situation needs to be analyzed to determine which trigger for pipeline migration to use. When a pipeline needs to be migrated, it needs to be pre-allocated to the corresponding extension service before stopping the original pipeline branch. Note that recently migrated pipeline branches will not be migrated again, and non-affinity pipelines are prioritized for offloading.

ThresholdMigrator

ThresholdMigrator uses a historical scoring algorithm to compute scores for each broker to address performance fluctuations. The usage rate of extension services and the average usage rate are calculated as follows: if the usage rate of an extension service exceeds the average usage rate plus ThresholdMigratorPercentage, the load balancer removes enough pipeline branches to ensure that the usage rate of each extension service is less than ThresholdMigratorPercentage + avgUsage.

usage =  
    max (
    %cpu * cpuWeight
    %memory * memoryWeight,
    %bandwidthIn * bandwidthInWeight,
    %bandwidthOut * bandwidthOutWeight) / 100;

    usage = x * prevUsage + (1 - x) * usage

    avgUsage = sum(usage) / numService

For example, if the usage rates of three extension services are 90%, 50%, and 10% respectively, and the average is 50%, with ThresholdMigratorPercentage being 30%, balancing is required because 90% > 50% + 30%.

Advantages

This load strategy handles sudden traffic spikes effectively.

Disadvantages

The algorithm may result in incorrect migrations, leading to multiple migrations until a stable state is reached. For example, if there is only one service1 in the cluster with a CPU usage of 90%, and a new service2 is added to the cluster with an initial CPU usage of 10%, the scores for the extension services would be 90 for service1 and 10 for service2.
1. In the first round of load balancing, service1 migrates pipelines corresponding to 40% CPU usage to service2, and the CPU usage of both services becomes 50%, which is sufficient.
2. In the second round of load balancing, the score for service1 is 0.9 * 90 + 0.1 * 50 = 86 (assuming historyPercentage is 0.9), and the score for service2 is 10 * 0.9 + 50 * 0.1 = 14. With an average of (86 + 14) / 2 = 50, 86 > avg + 10 (assuming the default value of ThresholdPercentage is 10), so the algorithm considers service1 overloaded. Then service1 migrates pipelines corresponding to (86-14) / 2 = 36 CPU usage to service2, with service1's CPU usage becoming 50 - 36 = 14, and service2's CPU usage becoming 50 + 36 = 86.
3. This process repeats multiple rounds of load balancing until a stable state is reached.

Configuration

Configuration Name	Purpose	Default Value
CPUResourceWeigh	The CPU usage weight when calculating new resource usage	1.0
MemoryResourceWeight	The memory usage weight when calculating new resource usage	1.0
BandwithInResourceWeight	The BandwidthIn usage weight when calculating new resource usage	1.0
BandwithOutResourceWeight	The BandwidthOut usage weight when calculating new resource usage	1.0
ThresholdMigratorPercentage	The service resource usage threshold	10
HistoryResourcePercentage	When calculating new resource usage, the historical usage accounts for	0.9

OverloadMigrator

Sets thresholds for CPU, network, and memory usage rates. If any of these thresholds are reached in an extension service, pipeline offloading is triggered.

Disadvantages

If the threshold is set to 90 and the current load situation for a service is (80, 0, 0), no load balancing will occur.

Configuration

Configuration Name	Purpose	Default Value
OverloadedThresholdPercentage	Usage threshold to determine a service as overloaded	85

UniformLoadMigrator

Calculates the message event rate for all extension services. If (max-min)/min > Threshold, pipeline migration is triggered. The load of each node does not exceed (max-min)/min.

Configuration

Configuration Name	Purpose	Default Value
MsgRateDifferenceMigratorThreshold	Message-rate percentage threshold between the highest and least loaded service for uniform load Migration	85

TransferMigrator

The Transfer Migrator strategy offloads pipelines from the highest-loaded extension service to the lowest-loaded extension service until all the following conditions are met:

The standard deviation of the load distribution across brokers is lower than the configured threshold (LoadTargetStd, default value is 0.25).
There are no obvious underloaded extension services.
- No pipeline receives 0 traffic.
- No extension service load < avgLoad * min(0.5, LoadTargetStd / 2).
There are no obvious overloaded extension services.
- No extension service load > OverloadedThresholdPercentage && load > avgLoad + LoadTargetStd.

After arranging according to TransferMigrator as shown above, 40% of the load is migrated to the service with the lowest load, and the second service assigns 25% of the load to the second-to-last load.

Configuration

Configuration Name	Purpose	Default Value
CPUResourceWeigh	The CPU usage weight when calculating new resource usage	1.0
MemoryResourceWeight	The memory usage weight when calculating new resource usage	1.0
BandwithInResourceWeight	The BandwidthIn usage weight when calculating new resource usage	1.0
BandwithOutResourceWeight	The BandwidthOut usage weight when calculating new resource usage	1.0
OverloadedThresholdPercentage	The service resource usage threshold	85
HistoryResourcePercentage	When calculating new resource usage, the historical usage accounts for	0.9
LoadTargetStd	Target standard deviation range	25

dominikriemer · 2024-03-13T20:27:49Z

dominikriemer
Mar 13, 2024
Collaborator

Hi @luoluoyuyu,
awesome...thanks for this very cool and detailed concept, I absolutely love this :-)
Do you see any open questions or any pre-requirements which need to be done in the core before implementing this?

0 replies

bossenti · 2024-03-14T08:06:16Z

bossenti
Mar 14, 2024
Collaborator

Hi @luoluoyuyu,

thank you very much for your proposal! This is really outstanding and shows a lot of dedication ❤️
I already look forward to see this becoming reality and I'd be happy to support this endeavor to any extent.

I was wondering how the pipeline separation is supposed to work within StreamPipes. Should user-defined pipelines be separated in "technical" pipelines under the hood? Do you already have an idea how this could look like? If not, I'm fine - we'll find out 🙂

Do you already have a rough idea how we can plan the implementation effort? Where do we want to start? Do we want to address one use case first (e.g., pipeline allocation) and then focus on the remaining?

With respect to the actual implementation: Do we want/need to build everything from scratch? Are there some libraries/frameworks we may adopt? Does it make sense to have a look how other tools handle load balancing of DAG execution? Maybe something like Airflow?

How does pipeline separation work?

3 replies

luoluoyuyu Mar 14, 2024
Collaborator Author

Hi @bossenti
I would expect that adding a load balancer would not change the existing user operation logic. Regarding the separation of the pipeline it should be invisible for the user, just in CORE dividing the pipeline in branches as allocation units. For the extension service (the pipeline is also not visible, it only knows to run one of the elements), so this part can only be done in core.
There are several reasons for pipeline separation:
1, as shown in the above allocation, the elements that can be run by each extension service may be different, which may result in fewer extension services being available for allocation in pipeline-based allocation units, and if this occurs for a large number of pipelines it can result in a serious overload of one service and low utilisation of other services.
2. When a pipeline consumes a large number of server resources, the pipeline can be divided into two parts, which can better average the computing resources.
Regarding the separated units of the pipeline, it is only necessary to add additional data structures to save them.

luoluoyuyu Mar 14, 2024
Collaborator Author

For the load balancing equaliser proposal, I'm looking more at the pulsar load balancing strategy.
The implementation I hope to divide into several parts
1、Service metrics collection, pipeline allocation
2、When a service is down, the pipeline reallocation
3, the implementation of pipeline migration interface and TransferMigrator
4, the subsequent implementation of several migrator
5, Pipeline separation
6, prometheus metrics and grafana dashboards
7, load balancing test

bossenti Mar 14, 2024
Collaborator

Thank you for your explanation 👍🏼
Having Pulsar as a reference is also completely fine for me.

I think your implementation plan looks quite good 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elements Service Load Balancer #2552

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Elements Service Load Balancer #2552

luoluoyuyu Mar 13, 2024 Collaborator

Proposal: Elements Service Load Balancer

Motivation

Goals

User-Facing Goals

Logging/Metrics

Configuration

Internal Implementation Goals

Logic

Logging/Metrics

Configuration

Testing

Load Balancer Implementation

Pipeline Allocation

Affinity

Data Model

ServiceLoadData

PipelineLoadData

Data Read/Write Flow

ServiceLoadData

PipelineLoadData

Pipeline Allocation

Pipeline Segmentation

Allocation Algorithms

Affinity

Pipeline Separation

Separation Algorithm

Pipeline Migrator

ThresholdMigrator

Advantages

Disadvantages

Configuration

OverloadMigrator

Disadvantages

Configuration

UniformLoadMigrator

Configuration

TransferMigrator

Configuration

Replies: 2 comments · 3 replies

dominikriemer Mar 13, 2024 Collaborator

bossenti Mar 14, 2024 Collaborator

luoluoyuyu Mar 14, 2024 Collaborator Author

luoluoyuyu Mar 14, 2024 Collaborator Author

bossenti Mar 14, 2024 Collaborator

luoluoyuyu
Mar 13, 2024
Collaborator

Replies: 2 comments 3 replies

dominikriemer
Mar 13, 2024
Collaborator

bossenti
Mar 14, 2024
Collaborator

luoluoyuyu Mar 14, 2024
Collaborator Author

luoluoyuyu Mar 14, 2024
Collaborator Author

bossenti Mar 14, 2024
Collaborator