From 8a9492403c6b50aa0c7b43d5f6156c67d3c63ca1 Mon Sep 17 00:00:00 2001 From: Alvaro <102966649+AlvaroStream@users.noreply.github.com> Date: Fri, 5 Jul 2024 20:07:22 +0200 Subject: [PATCH] Add a note about increases in backlog when replicated subscription is activated (#916) * Solves apache/pulsar/issues/22926 Tries to clarify a possible increase in backlog and from where it is coming * Update version 3.3 --- docs/administration-geo.md | 6 ++++++ versioned_docs/version-2.11.x/administration-geo.md | 6 ++++++ versioned_docs/version-3.0.x/administration-geo.md | 6 ++++++ versioned_docs/version-3.2.x/administration-geo.md | 6 ++++++ versioned_docs/version-3.3.x/administration-geo.md | 6 ++++++ 5 files changed, 30 insertions(+) diff --git a/docs/administration-geo.md b/docs/administration-geo.md index 575d92610c2c..edef835d1fce 100644 --- a/docs/administration-geo.md +++ b/docs/administration-geo.md @@ -241,6 +241,12 @@ The limitations of replicated subscription are as follows. * When you enable replicated subscriptions, you're creating a consistent distributed snapshot to establish an association between message ids from different clusters. The snapshots are taken periodically. The default value is `1 second`. It means that a consumer failing over to a different cluster can potentially receive 1 second of duplicates. You can also configure the frequency of the snapshot in the `broker.conf` file. * Only the base line cursor position is synced in replicated subscriptions while the individual acknowledgments are not synced. This means the messages acknowledged out-of-order could end up getting delivered again, in the case of a cluster failover. +:::note + +* This replicated subscription will add a new special message every second, it will contains the [snapshot](https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions#storing-snapshots) of the subscription. That means, if there are inactive subscriptions over the topic there can be an increase in backlog in source and destination clusters. + +::: + ## Migrate data between clusters using geo-replication Using geo-replication to migrate data between clusters is a special use case of the [active-active replication pattern](concepts-replication.md#active-active-replication) when you don't have a large amount of data. diff --git a/versioned_docs/version-2.11.x/administration-geo.md b/versioned_docs/version-2.11.x/administration-geo.md index eeebca307fda..dc74694c3f50 100644 --- a/versioned_docs/version-2.11.x/administration-geo.md +++ b/versioned_docs/version-2.11.x/administration-geo.md @@ -240,6 +240,12 @@ If you want to use replicated subscriptions in Pulsar: * When you enable replicated subscriptions, you're creating a consistent distributed snapshot to establish an association between message ids from different clusters. The snapshots are taken periodically. The default value is `1 second`. It means that a consumer failing over to a different cluster can potentially receive 1 second of duplicates. You can also configure the frequency of the snapshot in the `broker.conf` file. * Only the base line cursor position is synced in replicated subscriptions while the individual acknowledgments are not synced. This means the messages acknowledged out-of-order could end up getting delivered again, in the case of a cluster failover. +:::note + +* This replicated subscription will add a new special message every second, it will contains the [snapshot](https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions#storing-snapshots) of the subscription. That means, if there are inactive subscriptions over the topic there can be an increase in backlog in source and destination clusters. + +::: + ## Migrate data between clusters using geo-replication Using geo-replication to migrate data between clusters is a special use case of the [active-active replication pattern](concepts-replication.md#active-active-replication) when you don't have a large amount of data. diff --git a/versioned_docs/version-3.0.x/administration-geo.md b/versioned_docs/version-3.0.x/administration-geo.md index 53043ba68b16..90f2b8a5045c 100644 --- a/versioned_docs/version-3.0.x/administration-geo.md +++ b/versioned_docs/version-3.0.x/administration-geo.md @@ -246,6 +246,12 @@ If you want to use replicated subscriptions in Pulsar: * When you enable replicated subscriptions, you're creating a consistent distributed snapshot to establish an association between message ids from different clusters. The snapshots are taken periodically. The default value is `1 second`. It means that a consumer failing over to a different cluster can potentially receive 1 second of duplicates. You can also configure the frequency of the snapshot in the `broker.conf` file. * Only the base line cursor position is synced in replicated subscriptions while the individual acknowledgments are not synced. This means the messages acknowledged out-of-order could end up getting delivered again, in the case of a cluster failover. +:::note + +* This replicated subscription will add a new special message every second, it will contains the [snapshot](https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions#storing-snapshots) of the subscription. That means, if there are inactive subscriptions over the topic there can be an increase in backlog in source and destination clusters. + +::: + ## Migrate data between clusters using geo-replication diff --git a/versioned_docs/version-3.2.x/administration-geo.md b/versioned_docs/version-3.2.x/administration-geo.md index 575d92610c2c..edef835d1fce 100644 --- a/versioned_docs/version-3.2.x/administration-geo.md +++ b/versioned_docs/version-3.2.x/administration-geo.md @@ -241,6 +241,12 @@ The limitations of replicated subscription are as follows. * When you enable replicated subscriptions, you're creating a consistent distributed snapshot to establish an association between message ids from different clusters. The snapshots are taken periodically. The default value is `1 second`. It means that a consumer failing over to a different cluster can potentially receive 1 second of duplicates. You can also configure the frequency of the snapshot in the `broker.conf` file. * Only the base line cursor position is synced in replicated subscriptions while the individual acknowledgments are not synced. This means the messages acknowledged out-of-order could end up getting delivered again, in the case of a cluster failover. +:::note + +* This replicated subscription will add a new special message every second, it will contains the [snapshot](https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions#storing-snapshots) of the subscription. That means, if there are inactive subscriptions over the topic there can be an increase in backlog in source and destination clusters. + +::: + ## Migrate data between clusters using geo-replication Using geo-replication to migrate data between clusters is a special use case of the [active-active replication pattern](concepts-replication.md#active-active-replication) when you don't have a large amount of data. diff --git a/versioned_docs/version-3.3.x/administration-geo.md b/versioned_docs/version-3.3.x/administration-geo.md index 575d92610c2c..edef835d1fce 100644 --- a/versioned_docs/version-3.3.x/administration-geo.md +++ b/versioned_docs/version-3.3.x/administration-geo.md @@ -241,6 +241,12 @@ The limitations of replicated subscription are as follows. * When you enable replicated subscriptions, you're creating a consistent distributed snapshot to establish an association between message ids from different clusters. The snapshots are taken periodically. The default value is `1 second`. It means that a consumer failing over to a different cluster can potentially receive 1 second of duplicates. You can also configure the frequency of the snapshot in the `broker.conf` file. * Only the base line cursor position is synced in replicated subscriptions while the individual acknowledgments are not synced. This means the messages acknowledged out-of-order could end up getting delivered again, in the case of a cluster failover. +:::note + +* This replicated subscription will add a new special message every second, it will contains the [snapshot](https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions#storing-snapshots) of the subscription. That means, if there are inactive subscriptions over the topic there can be an increase in backlog in source and destination clusters. + +::: + ## Migrate data between clusters using geo-replication Using geo-replication to migrate data between clusters is a special use case of the [active-active replication pattern](concepts-replication.md#active-active-replication) when you don't have a large amount of data.