Skip to content

[fix][broker] Allow recreation of partitioned topic after metadata loss #24225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 14, 2025

Conversation

nodece
Copy link
Member

@nodece nodece commented Apr 28, 2025

Motivation

In Pulsar, it's possible for actual partition topics (e.g., public/default/test-partitioned-topic-partition-1) to exist even though the corresponding partitioned topic metadata is missing. This inconsistency can occur due to human error (e.g., accidental deletion of metadata) or implicit topic creation via auto-topic creation settings.

Specifically, if allowAutoTopicCreation=true and allowAutoTopicCreationType=non-partitioned, then producing or consuming directly from a partition topic name like test-partitioned-topic-partition-1 will cause Pulsar to auto-create a non-partitioned topic. This leads to an inconsistent state: the topic appears in topics list, but not in topics list-partitioned-topics. From the user's perspective, they expect to see a partitioned topic, but instead, list-partitioned-topics returns nothing, leading to confusion and potential automation failures.

Steps to reproduce:

$ docker run -it --name pulsar-partitioned-topic-test --rm apachepulsar/pulsar:4.0.3 bin/pulsar standalone -nfw
$ docker exec -it pulsar-partitioned-topic-test bash

# Produce to a partition name directly
$ bin/pulsar-client produce public/default/test-partitioned-topic-partition-1 -m "hello"
# Output: 1 messages successfully produced

# Listed non-partitioned topics
$ bin/pulsar-admin topics list public/default
persistent://public/default/test-partitioned-topic-partition-1

# Listed partitioned topics (none exists)
$ bin/pulsar-admin topics list-partitioned-topics public/default

Once such a partition is mistakenly created as a non-partitioned topic, it is no longer possible to convert it into a partitioned topic, because Pulsar does not allow overwriting existing topics. This results in effective metadata loss. A similar issue can occur in geo-replication scenarios, where only the individual partition topics are replicated, but the partitioned topic metadata is not.

Moreover, if the partitioned topic metadata is lost, running pulsar-admin topics create-partitioned-topic fails to recreate it:

> pulsar-admin topics create-partitioned-topic test-partitioned-topic -p 4
2025-04-25T19:45:14,373+0800 [AsyncHttpClient-7-2] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/test-partitioned-topic/partitions?createLocalTopicOnly=false] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 {"reason":"This topic already exists"}
This topic already exists

In this case, the partition still exist, but the partitioned topic metadata is missing—causing a 409 Conflict when attempting to recreate it.

Modifications

  • Use pulsar().getNamespaceService().checkTopicExists(topicName) to verify the existence of the topic or its metadata. If the topic or its metadata does not exist, the system proceeds with the creation of the partitioned topic, allowing its recreation.
  • When validating consistency with non-partitioned topics, ensure that the number of partitions is greater than or equal to the max partition index found in the non-partitioned topic list.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@nodece
Copy link
Member Author

nodece commented Apr 28, 2025

This PR depends on the checkTopicExists introduced in #24118 and #24154, please be cautious when cherry-picking it to branch-3.0 and branch-4.0.

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

This PR depends on the checkTopicExists introduced in #24118 and #24154, please be cautious when cherry-picking it to branch-3.0 and branch-4.0.

@nodece Could we remove the release labels in that case?

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

@nodece Please check the test failures whether those are valid ones

@nodece
Copy link
Member Author

nodece commented Apr 28, 2025

This PR depends on the checkTopicExists introduced in #24118 and #24154, please be cautious when cherry-picking it to branch-3.0 and branch-4.0.

@nodece Could we remove the release labels in that case?

This is a bugfix. I suggest we cherry-pick this PR to the branch-3.0 and branch-4.0.

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

This PR depends on the checkTopicExists introduced in #24118 and #24154, please be cautious when cherry-picking it to branch-3.0 and branch-4.0.

@nodece Could we remove the release labels in that case?

This is a bugfix. I suggest we cherry-pick this PR to the branch-3.0 and branch-4.0.

@nodece what does "This PR depends on the checkTopicExists introduced in #24118 and #24154, please be cautious when cherry-picking it to branch-3.0 and branch-4.0." mean?

@nodece
Copy link
Member Author

nodece commented Apr 28, 2025

org.apache.pulsar.broker.namespace.NamespaceService#checkTopicExists was enhanced in #24118 and #24154.
This PR depends on those improvements, as those branches may not include the necessary changes yet.

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

org.apache.pulsar.broker.namespace.NamespaceService#checkTopicExists was enhanced in #24118 and #24154. This PR depends on those improvements, as those branches may not include the necessary changes yet.

In that case, it's better to remove the release labels until the condition has been addressed. The reason for this is that release labels are used to track cherry-picking of individual PRs. This PR cannot be cherry-picked and therefore it shouldn't contain the label until it's ready. I use some scripts to track cherry-picking across branches using this assumption. This PR would pop up in the cherry-picking queue each time if it's labeled.

@codecov-commenter
Copy link

codecov-commenter commented Apr 29, 2025

Codecov Report

Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.26%. Comparing base (bbc6224) to head (98a08b4).
Report is 1057 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/pulsar/broker/admin/AdminResource.java 92.85% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24225      +/-   ##
============================================
+ Coverage     73.57%   74.26%   +0.69%     
+ Complexity    32624    32129     -495     
============================================
  Files          1877     1866      -11     
  Lines        139502   144802    +5300     
  Branches      15299    16540    +1241     
============================================
+ Hits         102638   107542    +4904     
+ Misses        28908    28771     -137     
- Partials       7956     8489     +533     
Flag Coverage Δ
inttests 26.73% <57.57%> (+2.15%) ⬆️
systests 23.27% <30.30%> (-1.06%) ⬇️
unittests 73.74% <93.93%> (+0.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...pulsar/broker/admin/impl/PersistentTopicsBase.java 70.09% <100.00%> (+4.64%) ⬆️
...ache/pulsar/broker/namespace/NamespaceService.java 73.82% <100.00%> (+1.58%) ⬆️
.../org/apache/pulsar/broker/admin/AdminResource.java 76.64% <92.85%> (-0.99%) ⬇️

... and 1081 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nodece nodece requested a review from lhotari April 29, 2025 11:31
@wangchao316
Copy link
Member

hello , I am a new developer for pulsar, our product start to use pulsar, which replace kafka in serverless env. I can learn your pr. I have a little question. why does the actual topic partitions may still exist, but the partitioned metadata is missing? Human error deletion or does not full delete?

@nodece
Copy link
Member Author

nodece commented Apr 30, 2025

why does the actual topic partitions may still exist, but the partitioned metadata is missing? Human error deletion or does not full delete?

Good catch! This PR description has been updated, please check.

@wangchao316
Copy link
Member

why does the actual topic partitions may still exist, but the partitioned metadata is missing? Human error deletion or does not full delete?

Good catch! This PR description has been updated, please check.

thanks, I understand.

Copy link
Member

@wangchao316 wangchao316 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nodece nodece requested a review from poorbarcode April 30, 2025 07:58
@nodece nodece force-pushed the fix-recreate-partitioned-topic branch from 98a08b4 to e4c0f25 Compare May 13, 2025 15:57
Signed-off-by: Zixuan Liu <[email protected]>
@nodece nodece merged commit 2dd1ef6 into apache:master May 14, 2025
52 of 53 checks passed
@nodece nodece added this to the 4.1.0 milestone May 14, 2025
nodece added a commit to ascentstream/pulsar that referenced this pull request May 14, 2025
…ss (apache#24225)

Signed-off-by: Zixuan Liu <[email protected]>
(cherry picked from commit 2dd1ef6)
Signed-off-by: Zixuan Liu <[email protected]>

# Conflicts:
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/AdminResource.java
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/admin/AdminApiTest.java
nodece added a commit to ascentstream/pulsar that referenced this pull request May 28, 2025
…ss (apache#24225)

Signed-off-by: Zixuan Liu <[email protected]>

(cherry picked from commit 2dd1ef6)
Signed-off-by: Zixuan Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants