Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add topic stats and metrics for observing message replay behavior and Key_Shared filtering/blocking behavior #23205

Closed
1 of 2 tasks
lhotari opened this issue Aug 20, 2024 · 6 comments
Assignees
Labels
release/blocker Indicate the PR or issue that should block the release until it gets resolved type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@lhotari
Copy link
Member

lhotari commented Aug 20, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently, it's very challenging to investigate issues related to message replay ("message redelivery controller"). Some examples of this include:

Solution

Add topic stats and metrics for observing message replay and related Key_Shared filtering (hash blocking) behavior.

Specific Metrics to Consider

  1. Number of messages in redelivery (replay)
  2. For Key_Shared subscriptions: Ways to observe internal state related to blocked hashes
  3. Counter for delayed delivery messages being added to delivery (replay)

Implementation Requirements

  • It should be possible to detect replays in topic stats (or internal stats) and also in aggregated metrics
  • The aggregated metrics should be usable in monitoring tools (e.g., Grafana dashboards)
  • The specific types of metrics (counters, gauges) to be used will be determined in the detailed design phase

Expected Benefits

  • Improved observability for message replay and Key_Shared behavior
  • Easier troubleshooting of related issues
  • Enhanced monitoring capabilities for Pulsar clusters

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Aug 20, 2024
@lhotari
Copy link
Member Author

lhotari commented Aug 22, 2024

It seems that PIP-282 added some subscription stats in #21953 that improve observability of Key_Shared.

@poorbarcode poorbarcode self-assigned this Aug 22, 2024
@lhotari
Copy link
Member Author

lhotari commented Aug 22, 2024

There's already a counter for message redelivery:

public long getMessageRedeliverCounter() {
return msgRedeliverCounter.sum();
}

However, this isn't currently exposed in the subscription stats.
This counter was added as part of Otel changes in #22693 .
There's also an ack counter that was added:
public long getMessageAckCounter() {
return messageAckCounter.sum();
}

I think that it would be a non-breaking change to expose these in stats which wouldn't necessarily require a PIP.

@lhotari
Copy link
Member Author

lhotari commented Oct 4, 2024

@lhotari
Copy link
Member Author

lhotari commented Oct 14, 2024

#23224 implemented msgInReplay / pulsar_subscription_in_replay.

@lhotari
Copy link
Member Author

lhotari commented Oct 14, 2024

#23429 adds observability for PIP-379 Key_Shared implementation.
drainingHashesCount, drainingHashesClearedTotal, drainingHashesUnackedMessages and drainingHashes

@lhotari
Copy link
Member Author

lhotari commented Oct 14, 2024

Closing this as resolved with #23224 and #23429 in PIP-379 implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/blocker Indicate the PR or issue that should block the release until it gets resolved type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Development

No branches or pull requests

2 participants