[fix][broker]Fix repeatedly acquire pending reads quota #23869

poorbarcode · 2025-01-20T14:19:15Z

Motivation

Background

Broker will limit the pending reading to Bookies if enabled managedLedgerMaxReadsInFlightSize
[managed-ledger] Do not send duplicate reads to BK/offloaders #17241 introduced a mechanism that merges BK reading requests if we can, for example:
- Request-1: Read entry 50~59
- Request-2: Read entry 50~69
- The Request-2 will wait for Request-1, and only send a real request that reads 60~69 after Request-1 is finished.

Issue

The Request-2 above requests repeatedly
- It requests 50~70 when creating.
- It requests another range 61~70 after Request-1 is finished.
Expected: Request-2 acquire 20 quota.
Exact: Request-2 acquire 40 quota.
You can call testPreciseLimitation to reproduce the issue

Modifications

Fix the issue.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: x

lhotari · 2025-01-23T07:02:08Z

...er/src/test/java/org/apache/bookkeeper/mledger/impl/InflightReadsLimiterIntegrationTest.java

+        LedgerHandle currentLedger = ml.currentLedger;
+        LedgerHandle spyCurrentLedger = Mockito.spy(currentLedger);
+        ml.currentLedger = spyCurrentLedger;
+        Answer answer = invocation -> {
+            long firstEntry = (long) invocation.getArguments()[0];
+            log.info("reading entry: {}", firstEntry);
+            if (firstEntry == start1) {
+                // Wait 3s to make
+                firstReadingStarted.countDown();
+                readCompleteSignal1.await();
+                Object res = invocation.callRealMethod();
+                return res;
+            } else if(secondReadEntries.contains(firstEntry)) {
+                final CompletableFuture res = new CompletableFuture<>();
+                threadFactory.newThread(() -> {
+                    try {
+                        readCompleteSignal2.await();
+                        CompletableFuture<LedgerEntries> future =
+                                (CompletableFuture<LedgerEntries>) invocation.callRealMethod();
+                        future.thenAccept(v -> {
+                            res.complete(v);
+                        }).exceptionally(ex -> {
+                            res.completeExceptionally(ex);
+                            return null;
+                        });
+                    } catch (Throwable ex) {
+                        res.completeExceptionally(ex);
+                    }
+                }).start();
+                return res;
+            } else {
+                return invocation.callRealMethod();
+            }
+        };
+        doAnswer(answer).when(spyCurrentLedger).readAsync(anyLong(), anyLong());
+        doAnswer(answer).when(spyCurrentLedger).readUnconfirmedAsync(anyLong(), anyLong());


I wonder if the PulsarMockBookKeeper read handle interceptor (added in #23875) could be used here to avoid the use of Mockito?

I agree, but the current mock is compatible with both the mocked Bookie client and the real Bookie client.

That's true, but this test uses PulsarMockBookKeeper. The problem with Mockito is that it requires injecting changes deeply into the implementation. Incorrect use of Mockito could result in thread safety issues and other issues, causing flaky tests.
btw. When there are a large number of operations, Mockito Spies that record invocations should only be used when the invocations are validated. We have a helper for spies that don't record invocations (that could cause OOME issues in tests):

pulsar/pulsar-broker/src/test/java/org/apache/pulsar/broker/BrokerTestUtil.java

Lines 99 to 112 in 52e8730

/**

* Create a Mockito spy that is stub-only which does not record method invocations,

* thus saving memory but disallowing verification of invocations.

*

* @param object to spy on

* @return a spy of the real object

* @param <T> type of object

*/

public static <T> T spyWithoutRecordingInvocations(T object) {

return Mockito.mock((Class<T>) object.getClass(), Mockito.withSettings()

.spiedInstance(object)

.defaultAnswer(Mockito.CALLS_REAL_METHODS)

.stubOnly());

}

. It's not a problem in this test since there's not a large invocation count. The point is that with the read handle interceptor support in PulsarMockBookKeeper, this type of issues could be avoided in the first place.

lhotari · 2025-01-23T08:13:43Z

I agree that this issue exists, however there are broader issues in the managedLedgerMaxReadsInFlightSizeInMB solution.

Sharing some context about the issues I have found and what I have currently in progress:

I have reported issues #23482, #23504, #23505 and #23506 .
These issues are related to the broker cache and managedLedgerMaxReadsInFlightSizeInMB and dispatcherMaxReadSizeBytes limits.

I have already changes to address these issues in an experimental branch, pending the submission of individual PRs. Addressing the lack of caching for replay queue messages requires broader changes to the the broker cache and those will be covered with a new PIP.

Some earlier details shared in this comment: #23524 (comment) .

Due to issues #23482 and #23506, I don't think that this PR would resolve problems alone. Regarding broker OOMEs, issue #23504 would also need to be resolved. It's possible that dispatcherDispatchMessagesInSubscriptionThread=false and managedLedgerReadEntryTimeoutSeconds=15 (some reasonable value) could be a mitigation to some issues.

poorbarcode · 2025-01-23T08:29:59Z

@lhotari

I have already changes to address these issues in an experimental branch, pending the submission of individual PRs. Addressing the lack of caching for replay queue messages requires broader changes to the the broker cache and those will be covered with a new PIP.

Some earlier details shared in this comment: #23524 (comment) .

Due to issues #23482 and #23506, I don't think that this PR would resolve problems alone. Regarding broker OOMEs, issue #23504 would also need to be resolved. It's possible that dispatcherDispatchMessagesInSubscriptionThread=false and managedLedgerReadEntryTimeoutSeconds=15 (some reasonable value) could be a mitigation to some issues.

Agree with you, let us fix them one by one, which is easier to review, and we should add tests for each case.

heesung-sn · 2025-01-24T01:25:02Z

I think we should use AsyncTokenBucket or Guava.RateLimiter.tryAcquire for this rate limiter.

IMHO, the current logic, requiring acquiredPermit to release is error-prone, when the caller forgets to release it. Instead, I think we better use token bucket-based one, which can automatically fill the bucket.

lhotari · 2025-01-24T22:12:16Z

I think we should use AsyncTokenBucket or Guava.RateLimiter.tryAcquire for this rate limiter.

IMHO, the current logic, requiring acquiredPermit to release is error-prone, when the caller forgets to release it. Instead, I think we better use token bucket-based one, which can automatically fill the bucket.

@heesung-sn I agree, the current solution is problematic. However, a token bucket isn't most optimal for this use case. I have a redesigned solution the managedLedgerMaxReadsInFlightSize solution in my experimental branch (example). There are a lot of different changes in branch so it might be hard to see what it's about. I'll extract that into a clear PR when time comes.
There's first #23892 and #23894 which need reviews so that I could make further progress. Do you have a chance to review them?

[fix][broker]Fix repeated acquire pending reads quota

977b33a

poorbarcode added release/3.0.10 release/3.3.5 release/4.0.3 labels Jan 20, 2025

poorbarcode added this to the 4.1.0 milestone Jan 20, 2025

poorbarcode requested review from lhotari, rdhabalia, eolivelli, codelipenghui and gaoran10 January 20, 2025 14:19

poorbarcode self-assigned this Jan 20, 2025

poorbarcode changed the title ~~[fix][broker]Fix repeated acquire pending reads quota~~ [fix][broker]Fix repeatedly acquire pending reads quota Jan 20, 2025

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 20, 2025

lhotari reviewed Jan 23, 2025

View reviewed changes

lhotari mentioned this pull request Jan 23, 2025

[Enhancement] Improve Pulsar Broker cache defaults to get better out-of-the-box performance #23466

Open

2 tasks

poorbarcode requested a review from lhotari January 23, 2025 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker]Fix repeatedly acquire pending reads quota #23869

[fix][broker]Fix repeatedly acquire pending reads quota #23869

poorbarcode commented Jan 20, 2025

lhotari Jan 23, 2025

poorbarcode Jan 23, 2025

lhotari Jan 23, 2025

lhotari commented Jan 23, 2025 •

edited

Loading

poorbarcode commented Jan 23, 2025

heesung-sn commented Jan 24, 2025

lhotari commented Jan 24, 2025 •

edited

Loading

	/**
	* Create a Mockito spy that is stub-only which does not record method invocations,
	* thus saving memory but disallowing verification of invocations.
	*
	* @param object to spy on
	* @return a spy of the real object
	* @param <T> type of object
	*/
	public static <T> T spyWithoutRecordingInvocations(T object) {
	return Mockito.mock((Class<T>) object.getClass(), Mockito.withSettings()
	.spiedInstance(object)
	.defaultAnswer(Mockito.CALLS_REAL_METHODS)
	.stubOnly());
	}

[fix][broker]Fix repeatedly acquire pending reads quota #23869

Are you sure you want to change the base?

[fix][broker]Fix repeatedly acquire pending reads quota #23869

Conversation

poorbarcode commented Jan 20, 2025

Motivation

Modifications

Documentation

Matching PR in forked repository

lhotari Jan 23, 2025

Choose a reason for hiding this comment

poorbarcode Jan 23, 2025

Choose a reason for hiding this comment

lhotari Jan 23, 2025

Choose a reason for hiding this comment

lhotari commented Jan 23, 2025 • edited Loading

poorbarcode commented Jan 23, 2025

heesung-sn commented Jan 24, 2025

lhotari commented Jan 24, 2025 • edited Loading

lhotari commented Jan 23, 2025 •

edited

Loading

lhotari commented Jan 24, 2025 •

edited

Loading