-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Support large number of unack message store for cursor recovery #9292
Conversation
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdhabalia I think this can only improve the cases that do not contain the too big ack holes such as 10000 unacked messages with a hole. In other words, this optimization will increase the burden on the scenario which has big ack holes right?
For example, if the message with index 0 acked and with index 10000 acked but [1-9999] does not acked. I think the new approach needs 157 long value to persistent to the bookie.
If I understand correctly, I think we should to add a flag for enabling or disabling this enhancement.
Please correct me if I miss some context here
@codelipenghui
Sure, let me add the flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work.
what happens if I upgrade my broker to a new version and then I rollback to the previous version in case of problems ?
IIUC we are going to write data that would not be understood by the old version.
does it lead to some bad state of the system ? like having consumers that receive again the same messages ?
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedCursorTest.java
Outdated
Show resolved
Hide resolved
pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/LongPairRangeSet.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdhabalia This is a great change. However, I also had the same concern as what Penghui and Lin had. It is an improvement and doesn't solve all the problems. I would like to see how these concerns are addressed before merging this.
Also, Penghui suggested you adding a flag to enable/disable this feature. I think we should take one more step to abstract the "unack range management" into an interface. So we can load the implementation from class instead of using a flag to turn it on/off.
no, it will not cause bad state in broker. broker understands both the format as it has separate buckets for each. broker recovers unack message ranges based on which ever bucket has ranges and later persist ranges in only new bucket. Broker rollback will not be able to recover unack messages from new bucket and in that case it will redeliver messages newer markDeleteMessageOffset.
Yes, I shared my point of view in email regarding having unack message management. However, regardless abstraction changes, I think we can proceed with this PR as this can be used with existing default unack message implementation. |
I would like to have a consensus on providing an abstraction before merging this implementation. Otherwise, it becomes really hard to allow people to implement other algorithms. |
Can you add a backward compatibility integration test? |
The pr had no activity for 30 days, mark with Stale label. |
aeb11ab
to
1b7bb32
Compare
…recovery Fix test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9292 +/- ##
============================================
+ Coverage 73.57% 74.54% +0.97%
+ Complexity 32624 2758 -29866
============================================
Files 1877 1934 +57
Lines 139502 145222 +5720
Branches 15299 15875 +576
============================================
+ Hits 102638 108258 +5620
+ Misses 28908 28665 -243
- Partials 7956 8299 +343
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is very useful if I understand it correctly. Since individual acks get encoded as long[]
array, it will compress the information a lot. I guess a single long entry in the array will hold 64 bits and 64 individual acks. In theory, 1 million individual bits can be held in 128kB of memory (1024kB/8). @rdhabalia Is this the correct understanding?
I have already replied in the email but also updating the same reply here.
Yes, that's correct. It's like serializing a Position object to a bit which could be the smallest serializable size we could achieve among any ser/des approaches. In the past, there were two fundamental challenges we were facing to serve a large set of unack messages: (a) in memory pressure/ large GC-Pauses due to the large number of Position objects (b) serializing such a number of objects to store in bookie ledger for topic recovery. (a) was handled by #3819 to replace a Position object with a bit which can allow brokers to run with a large number of unack messages for a topic. But it also comes with a certain limit for large scale multi-tenant systems where a broker is serving a large number of topics and serving several millions of unack messages per topic can create memory pressure on the broker. Therefore, even if we solve (b) to store billions of unack messages while topic recovery, the broker might not run with stability beyond sever millions of unack messages. Talking about PIP-381, we might be able to solve storing > 10M-100M unack messages but the question is if a broker really has that many unack messages then will broker run with such huge memory pressure and will it really serve large scale usecases for which Pulsar was built? I am sure, it might be useful for small usecases and clients can use it if needed but it might not be useful for most of the usecases. |
Motivation
Right now, managed-cursor serializes individually deleted-message ranges and persist them in bookie ledger-entry for the recovery. managed-cursor can persist max 150K ranges due to limited bookie entry size (5MB). 150K ranges might not be enough for some of the use cases while recovering cursor. so, we need a mechanism to persist a couple of million ranges to support such usecases.
Modification
with #3818 and #3819 managed-cursor manages individually deleted messages in bitset with OpenRangeSet. Serializing OpenRangeSet can allow managed-cursor to store 10M ranges with 5MB data size.
Result
Usecases require a large number of individually deleted-messages that can be supported with this change.
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: