Skip to content

[feat][admin] PIP-415: Support getting message ID by index #24222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

liangyepianzhou
Copy link
Contributor

PIP:#24220

Motivation

we can now obtain the offset of a message by its message id:

  1. Get the message by id using get-message-by-id cmd
  2. Get the index of the message using Message.getIndex()

But we cannot obtain the message id by offset. Then we need to add a new API to get the message id by offset.

Modifications

Add a new http API to retrieve the message ID by offset.
We propose to add a new API to retrieve the message ID by offset, enabling us to cache the mapping between message ID and offset.
This will allow us to use offsets for seek and acknowledgment operations when consuming messages through the standardized API.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 27, 2025
@liangyepianzhou liangyepianzhou self-assigned this Apr 27, 2025
@liangyepianzhou liangyepianzhou marked this pull request as ready for review April 27, 2025 12:35
xiangying added 2 commits May 14, 2025 10:41
Copy link
Member

@StevenLuMT StevenLuMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#24220 PIP has a lot of adjustments.
@liangyepianzhou please ping me after you modify the code and I will review it again.

@liangyepianzhou liangyepianzhou changed the title [feat][admin] PIP-415: Support getting message ID by offset [feat][admin] PIP-415: Support getting message ID by index May 29, 2025
@AuroraTwinkle
Copy link
Contributor

Good jobs

@BewareMyPower
Copy link
Contributor

Out of the scope of this PR, I think it would be better to add a method to ManagedLedger for it. Because the customized ML implementation (e.g. StreamNative's Ursa engine) might not depend on the broker entry metadata to get the message's index. It could be another PIP.

[2] rename getMessageIDByIndexAndPartitionID to getMessageIDByIndex
@liangyepianzhou
Copy link
Contributor Author

Out of the scope of this PR, I think it would be better to add a method to ManagedLedger for it. Because the customized ML implementation (e.g. StreamNative's Ursa engine) might not depend on the broker entry metadata to get the message's index. It could be another PIP.

Yes, we should add a new manager ledger interface to support customized ML implementations.

Additionally, the current message index retrieval solution has suboptimal performance. We could improve this by adding a field in LedgerInfo to record the first entry's index, then compute subsequent indexes directly from message IDs.

I've created an issue to track this enhancement.

@dao-jun
Copy link
Member

dao-jun commented Jun 3, 2025

Overall LGTM, but the current impl will scan all entire ML, maybe you can optimize it via PIP-404

@liangyepianzhou
Copy link
Contributor Author

liangyepianzhou commented Jun 3, 2025

Overall LGTM, but the current impl will scan all entire ML, maybe you can optimize it via PIP-404

Interesting—I originally proposed directly adding the first entry's index in LedgerInfo to optimize this issue. While the properties field you added could also serve the same purpose, it currently doesn’t record the first entry's index. We can address this optimization in a future PIP, as mentioned in issue.

@nodece nodece requested a review from Copilot June 4, 2025 07:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new HTTP API to retrieve a Pulsar message ID given its index, enabling caching between message IDs and offsets and providing new CLI and admin functionalities.

  • Adds a new CLI command and corresponding API methods in the client admin components.
  • Implements REST endpoint and internal broker logic to support retrieving message IDs by index.
  • Adds tests to verify the new functionality for both partitioned and non-partitioned topics.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdTopics.java Introduces the new CLI command "get-message-id-by-index".
pulsar-client-tools-test/src/test/java/org/apache/pulsar/admin/cli/PulsarAdminToolTest.java Adds tests verifying the new CLI command functionality.
pulsar-client-admin/src/main/java/org/apache/pulsar/client/admin/internal/TopicsImpl.java Implements synchronous and asynchronous API methods for getMessageIdByIndex.
pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java Updates the Topics interface with new method signatures and documentation.
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerEntryMetadataE2ETest.java Adds integration test scenarios for the new message ID by index functionality.
pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java Implements new REST endpoint to support retrieving message ID by index.
pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java Provides the internal broker logic for the getMessageIdByIndex API.
Comments suppressed due to low confidence (1)

pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java:5033

  • The variable 'topicName' is not defined in this scope. It appears you intended to log the resolved topic name (perhaps 'encodedTopic' or a combination of tenant, namespace, and encodedTopic).
log.error("[{}] Failed to get message id by index for topic {}, partition id {}, index {}", clientAppId(), topicName, index, ex);

Copy link
Member

@dao-jun dao-jun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments, Overall LGTM.
By the way, if you rely on the AppendMessageIndexInterceptor, I'd suggest you need to check whether the interceptor configured before start the broker.

PulsarAdminException.class, NotFoundException.class);
}

private void assertThrowsWithCause(Executable executable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void assertThrowsWithCause(Executable executable,
private void assertThrowsWithCause(Runnable executable,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run method of Runnable does not throw checked exceptions, while our code throws PulsarAdminException (a checked exception).

Therefore, the code will fail at compile time because we are throwing a checked exception in the run method of Runnable (inside a lambda expression), whereas the run method of Runnable does not declare any exceptions.

Copy link
Member

@nodece nodece Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got, but we should not use the junit API.

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants