Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][doc] add overview for broker load balancing #621

Merged
merged 2 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/concepts-broker-load-balancing-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
id: concepts-broker-load-balancing-concepts
title: Concepts
sidebar_label: "Concepts"
---

WIP. Stay tuned!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is created by purpose, to notify users we're on it and will publish soon.

32 changes: 32 additions & 0 deletions docs/concepts-broker-load-balancing-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
id: concepts-broker-load-balancing-overview
title: Overview
sidebar_label: "Overview"
---


## Challenges of load balancing in distributed streaming systems

Like other distributed systems, load balancing is important in messaging and streaming systems. Without it, load imbalance can cause hot-spot brokers, resulting in performance degradation, cluster unavailability, and wasted broker resources.

Due to the unpredictable topic volume and physical distance among distributed brokers, it is not easy to dynamically distribute message loads among brokers. It requires the system to continuously monitor and route message loads based on changing conditions without compromising system performance. For example:

- When topics receive high traffic, exhausting CPU or memory resources on particular brokers, the cluster offloads the overloaded brokers and redistributes the load to other brokers.

- When brokers experience low traffic, become idle, or are added or removed, the cluster rebalances the load to avoid wasting resources.

- When topics are redistributed to other brokers, the cluster ensures the topics are instantaneously available to clients. The topics continue to guarantee the system performance, such as persistence, [ordering](./concepts-messaging.md#ordering-guarantee), [deduplication](./concepts-messaging.md#message-deduplication), [subscription type](./concepts-messaging.md#subscription-types), etc.

## Load balancing in Pulsar

Because Pulsar uses a [segment-centric architecture](./concepts-architecture-overview.md) and separates the message serving and storage layer, it is designed to benefit load balancing.

- At the persistence layer ([BookKeeper](https://bookkeeper.apache.org/)), message segments in topics are balanced across all the bookies in the cluster. When an individual bookie runs out of storage capacity, the rest segments are loaded into the available bookies.

- At the serving layer ([broker](./concepts-architecture-overview.md#brokers)), topic rearrangement (balance) is seamless. Brokers do not need to copy messages from one broker to another when rebalancing topics among brokers. Instead, the current owner broker temporarily closes the topic and client sessions and transfers the ownership to the selected broker. Then, the selected broker takes ownership of the topic and opens the topic sessions to the clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : [broker] -> [Broker]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here we should use "broker" rather than "Broker" since it's a normal noun.


Pulsar uses automatic broker load balancing to monitor the brokers' load internally and then dynamically balances topic sessions according to the load across all available brokers as evenly, dynamically, and flexibly as possible. Consequently, it improves performance, availability, and usage of resources.

## Related topics

- To learn essential fundamentals, see [broker load balancing | concepts](./concepts-broker-load-balancing-concepts.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broker load balancing concepts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use | on purpose since there will be broker load balancing | use cases, broker load balancing | benefits, broker load balancing | workflow... later.

8 changes: 8 additions & 0 deletions sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@
"concepts-messaging",
"concepts-architecture-overview",
"concepts-clients",
{
"type": "category",
"label": "Broker load balancing",
"items": [
"concepts-broker-load-balancing-overview",
"concepts-broker-load-balancing-concepts"
]
},
"concepts-replication",
"concepts-cluster-level-failover",
"concepts-multi-tenancy",
Expand Down