Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][doc] SEO for Concepts and Architecture except Overview and Messaging #674

Merged
merged 8 commits into from
Aug 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions docs/concepts-architecture-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
id: concepts-architecture-overview
title: Architecture Overview
sidebar_label: "Architecture"
description: Get a comprehensive understanding of the architecture of Apache Pulsar
---

At the highest level, a Pulsar instance is composed of one or more Pulsar clusters. Clusters within an instance can [replicate](concepts-replication.md) data amongst themselves.

In a Pulsar cluster:
A Pulsar cluster consists of the following components:

* One or more brokers handles and [load balances](administration-load-balance.md) incoming messages from producers, dispatches messages to consumers, communicates with the Pulsar configuration store to handle various coordination tasks, stores messages in BookKeeper instances (aka bookies), relies on a cluster-specific ZooKeeper cluster for certain tasks, and more.
* A BookKeeper cluster consisting of one or more bookies handles [persistent storage](#persistent-storage) of messages.
Expand Down Expand Up @@ -56,7 +57,7 @@ In a Pulsar instance:

## Configuration store

The configuration store maintains all the configurations of a Pulsar instance, such as clusters, tenants, namespaces, partitioned topic-related configurations, and so on. A Pulsar instance can have a single local cluster, multiple local clusters, or multiple cross-region clusters. Consequently, the configuration store can share the configurations across multiple clusters under a Pulsar instance. The configuration store can be deployed on a separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster.
The configuration store is a ZooKeeper quorum that is used for configuration-specific tasks and it maintains all the configurations of a Pulsar instance, such as clusters, tenants, namespaces, partitioned topic-related configurations, and so on. A Pulsar instance can have a single local cluster, multiple local clusters, or multiple cross-region clusters. Consequently, the configuration store can share the configurations across multiple clusters under a Pulsar instance. The configuration store can be deployed on a separate ZooKeeper cluster or deployed on an existing ZooKeeper cluster.

## Persistent storage

Expand All @@ -75,20 +76,20 @@ Pulsar uses a system called [Apache BookKeeper](http://bookkeeper.apache.org/) f
* It's horizontally scalable in both capacity and throughput. Capacity can be immediately increased by adding more bookies to a cluster.
* Bookies are designed to handle thousands of ledgers with concurrent reads and writes. By using multiple disk devices---one for journal and another for general storage--bookies can isolate the effects of reading operations from the latency of ongoing write operations.

In addition to message data, *cursors* are also persistently stored in BookKeeper. Cursors are [subscription](reference-terminology.md#subscription) positions for [consumers](reference-terminology.md#consumer). BookKeeper enables Pulsar to store consumer position in a scalable fashion.
In addition to message data, *cursors* are also persistently stored in BookKeeper. Cursors are [subscription](concepts-messaging.md#subscriptions) positions for [consumers](concepts-clients.md#consumer). BookKeeper enables Pulsar to store consumer position in a scalable fashion.

At the moment, Pulsar supports persistent message storage. This accounts for the `persistent` in all topic names. Here's an example:

```http
persistent://my-tenant/my-namespace/my-topic
```

> Pulsar also supports ephemeral ([non-persistent](concepts-messaging.md#non-persistent-topics) message storage.
> Pulsar also supports ephemeral [non-persistent](concepts-messaging.md#non-persistent-topics) message storage.


You can see an illustration of how brokers and bookies interact in the diagram below:

![Brokers and bookies](/assets/broker-bookie.png)
![Brokers and bookies in a Pulsar cluster](/assets/broker-bookie.png)


### Ledgers
Expand Down Expand Up @@ -144,13 +145,13 @@ Some important things to know about the Pulsar proxy:

## Service discovery

[Clients](concepts-clients.md) connecting to Pulsar brokers need to be able to communicate with an entire Pulsar instance using a single URL.
Service discovery is a mechanism that enables connecting [clients](concepts-clients.md) to use just a single URL to interact with an entire Pulsar instance.

You can use your own service discovery system if you'd like. If you use your own system, there is just one requirement: when a client performs an HTTP request to an endpoint, such as `http://pulsar.us-west.example.com:8080`, the client needs to be redirected to *some* active broker in the desired cluster, whether via DNS, an HTTP or IP redirect, or some other means.

The diagram below illustrates Pulsar service discovery:

![alt-text](/assets/pulsar-service-discovery.png)
![Service discovery in Pulsar](/assets/pulsar-service-discovery.png)

In this diagram, the Pulsar cluster is addressable via a single DNS name: `pulsar-cluster.acme.com`. A [Python client](client-libraries-python.md), for example, could access this Pulsar cluster like this:

Expand Down
1 change: 1 addition & 0 deletions docs/concepts-authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
id: concepts-authentication
title: Authentication and Authorization
sidebar_label: "Authentication and Authorization"
description: Get a high-level understanding of authentication and authorization in Pulsar.
---

Pulsar supports a pluggable [authentication](security-overview.md) mechanism which can be configured at the proxy and/or the broker. Pulsar also supports a pluggable [authorization](security-authorization.md) mechanism. These mechanisms work together to identify the client and its access rights on topics, namespaces and tenants.
Expand Down
26 changes: 16 additions & 10 deletions docs/concepts-clients.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
id: concepts-clients
title: Pulsar Clients
sidebar_label: "Clients"
description: Get a comprehensive understanding of client APIs with language bindings for Java, C++, Go, Python, Node.js and C# in Pulsar.
---

Pulsar exposes a client API with language bindings for [Java](client-libraries-java.md), [C++](client-libraries-cpp.md), [Go](client-libraries-go.md), [Python](client-libraries-python.md), [Node.js](client-libraries-node.md) and [C#](client-libraries-dotnet.md). The client API optimizes and encapsulates Pulsar's client-broker communication protocol and exposes a simple and intuitive API for use by applications.
Expand All @@ -12,18 +13,23 @@ Pulsar client libraries support transparent reconnection and/or connection failo

Before an application creates a producer/consumer, the Pulsar client library needs to initiate a setup phase including two steps:

1. The client attempts to determine the owner of the topic by sending an HTTP lookup request to the broker. The request could reach one of the active brokers which, by looking at the (cached) zookeeper metadata knows who is serving the topic or, in case nobody is serving it, tries to assign it to the least loaded broker.
2. Once the client library has the broker address, it creates a TCP connection (or reuses an existing connection from the pool) and authenticates it. Within this connection, the client and broker exchange binary commands from a custom protocol. At this point, the client sends a command to create producer/consumer to the broker, which will comply after having validated the authorization policy.
1. The client attempts to determine the owner of the topic by sending an HTTP lookup request to the broker.

The request could reach one of the active brokers which, by looking at the (cached) Zookeeper metadata knows who is serving the topic or, in case nobody is serving it, tries to assign it to the least loaded broker.

2. Once the client library has the broker address, it creates a TCP connection (or reuses an existing connection from the pool) and authenticates it.

Within this connection, the client and broker exchange binary commands from a custom protocol. At this point, the client sends a command to create producer/consumer to the broker, which will comply after having validated the authorization policy.

Whenever the TCP connection breaks, the client immediately re-initiates this setup phase and keeps trying with exponential backoff to re-establish the producer or consumer until the operation succeeds.

## Producer

A producer is a process that attaches to a topic and publishes messages to a Pulsar [broker](reference-terminology.md#broker). The Pulsar broker processes the messages.
A producer is a process that attaches to a topic and publishes messages to a Pulsar [broker](concepts-architecture-overview.md#broker). The Pulsar broker processes the messages.

### Send mode

Producers send messages to brokers synchronously (sync) or asynchronously (async).
Send mode is a mechanism determining whether producers send messages to brokers synchronously (sync) or asynchronously (async).

| Mode | Description |
|:-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand All @@ -32,7 +38,7 @@ Producers send messages to brokers synchronously (sync) or asynchronously (async

### Access mode

You can have different types of access modes on topics for producers.
Access mode is a mechanism determining the permissions of producers on topics.

| Access mode | Description |
|:-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand All @@ -55,13 +61,13 @@ You can set producer access mode through [Java Client API](/api/client/). For mo

A consumer is a process that attaches to a topic via a subscription and then receives messages.

![Consumer](/assets/consumer.svg)
![Message processing workflow of a consumer in Pulsar](/assets/consumer.svg)

A consumer sends a [flow permit request](developing-binary-protocol.md#flow-control) to a broker to get messages. There is a queue at the consumer side to receive messages pushed from the broker. You can configure the queue size with the [`receiverQueueSize`](pathname:///reference/#/@pulsar:version_reference@/client/client-configuration-consumer?id=receiverqueuesize) parameter. The default size is `1000`). Each time `consumer.receive()` is called, a message is dequeued from the buffer.

### Receive mode

Messages are received from [brokers](reference-terminology.md#broker) either synchronously (sync) or asynchronously (async).
Receive mode is a mechanism determining whether messages are received from [brokers](concepts-architecture-overview.md#brokers) synchronously (sync) or asynchronously (async).

| Mode | Description |
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand All @@ -74,7 +80,7 @@ Client libraries provide listener implementation for consumers. For example, the

## Reader

In Pulsar, the "standard" [consumer interface](#consumer) involves using consumers to listen on [topics](reference-terminology.md#topic), process incoming messages, and finally acknowledge those messages when they are processed. Whenever a new subscription is created, it is initially positioned at the end of the topic (by default), and consumers associated with that subscription begin reading with the first message created afterward. Whenever a consumer connects to a topic using a pre-existing subscription, it begins reading from the earliest message un-acked within that subscription. In summary, with the consumer interface, subscription cursors are automatically managed by Pulsar in response to [message acknowledgments](concepts-messaging.md#acknowledgment).
In Pulsar, the "standard" [consumer interface](#consumer) involves using consumers to listen on [topics](concepts-messaging.md#topics), process incoming messages, and finally acknowledge those messages when they are processed. Whenever a new subscription is created, it is initially positioned at the end of the topic (by default), and consumers associated with that subscription begin reading with the first message created afterward. Whenever a consumer connects to a topic using a pre-existing subscription, it begins reading from the earliest message un-acked within that subscription. In summary, with the consumer interface, subscription cursors are automatically managed by Pulsar in response to [message acknowledgments](concepts-messaging.md#acknowledgment).

The **reader interface** for Pulsar enables applications to manually manage cursors. When you use a reader to connect to a topic---rather than a consumer---you need to specify *which* message the reader begins reading from when it connects to a topic. When connecting to a topic, the reader interface enables you to begin with:

Expand All @@ -94,7 +100,7 @@ Please also note that a reader can have a "backlog", but the metric is only used

:::

![The Pulsar consumer and reader interfaces](/assets/pulsar-reader-consumer-interfaces.png)
![Consumer and reader interfaces in Pulsar](/assets/pulsar-reader-consumer-interfaces.png)

## TableView

Expand All @@ -110,4 +116,4 @@ Each TableView uses one Reader instance per partition, and reads the topic start

The following figure illustrates the dynamic construction of a TableView updated with newer values of each key.

![TableView](/assets/tableview.png)
![Dynamic construction of a TableView in Pulsar](/assets/tableview.png)
Loading