Trigger maybeIncrementLeaderHW in the alterISR request callback #477

CCisGG · 2023-09-01T20:02:02Z

This PR should fix the super high Produce latency during uncontrolled broker death.

Description

Today, the maybeIncrementLeaderHW is called after shrinkISR() being called:

kafka/core/src/main/scala/kafka/cluster/Partition.scala

Line 1006 in 3475c3a

maybeIncrementLeaderHW(leaderLog)

The problem is shrinkISR() internally sends the AlterISR request to controller in an async manner. By the time maybeIncrementLeaderHW is called, the ISR state on the current broker is likely not updated yet.

This PR invokes maybeIncrementLeaderHW in the callback of the AlterISR request, making sure the ISR state has been updated when trying to incrementLeaderHW. Meanwhile, it calls the tryCompleteDelayedRequests after the HW is incremented. To avoid deadlock, the tryCompleteDelayedRequests is called out of the leaderIsrUpdateLock, with the help of a completable future.

With this change, the Produce Delay should be capped at ShrinkISR duration (potentially plus controller queue length), comparing to the previously infinite wait time.

Testing
In cert-candidate cluster, before the change, lots of request are timeout on broker hard-kill

After the change, the produce delay is capped at replicaLagMaxMs * 1.5 = 15 seconds

groelofs

LGTM modulo the possible deadlock issue--if you could just double-check that to be sure?

Can't argue with the results, though--2x speedup in cert, potentially 8x in regular clusters? Nice.

core/src/main/scala/kafka/cluster/Partition.scala

CCisGG · 2023-09-01T21:41:10Z

Can't argue with the results, though--2x speedup in cert, potentially 8x in regular clusters? Nice.

@groelofs Exactly. 8X is what I would expect from regular clusters.

core/src/main/scala/kafka/cluster/Partition.scala

lmr3796 · 2023-09-02T00:50:32Z

I did some research:

Upstream tried a similar fix for KAFKA-13091
apache#11245

And ends up with a deadlock issue KAFKA-13254
apache#11289

Though the code differs a lot; we may still want to check if we run into similar case.

Also, we should include relavant EXIT_CRITERIA here^.

groelofs · 2023-09-02T04:14:39Z

I did some research:

Upstream tried a similar fix for KAFKA-13091 apache#11245

And ends up with a deadlock issue KAFKA-13254 apache#11289

Though the code differs a lot; we may still want to check if we run into similar case.

Also, we should include relavant EXIT_CRITERIA here^.

Nice find! Unfortunately, the upstream fix looks sufficiently lengthy and complex that I'm not sure I'd trust it to carry over even if the code were more similar to ours. But I do like the extra validation that Hao's fix is needed, even if the implementation needs a bit more finesse.

CCisGG · 2023-09-05T03:42:21Z

Upstream tried a similar fix for KAFKA-13091
apache#11245

Oh wow... I didn't realize Kafka has a ticket system. When I investigated the issue, I tried to search github issues and KIP, but failed to find anything related. If I know this earlier that may save me 2 days! Thanks very much for bringing up the reference. Let me take a look at the upstream fix.

CCisGG · 2023-09-05T18:00:06Z

Nice find! Unfortunately, the upstream fix looks sufficiently lengthy and complex that I'm not sure I'd trust it to carry over even if the code were more similar to ours. But I do like the extra validation that Hao's fix is needed, even if the implementation needs a bit more finesse.

Actually the only functional difference is upstream triggers tryCompleteDelayedRequest after incrementHW, and I'm relying on other places that triggers tryCompleteDelayedRequest. However on a most recent glance, I didn't find any tryCompleteDelayedRequest could be triggered if there is no new Produce request. In that case, I don't even know how my fix actually solved the issue. Another mystery... That said, I may end up with bringing upstream changes. The good side is that makes us one step closer to the upstream, and is potentially beneficial if in the future we decide to merge upstream for KRaft.

Let me think about it more and test it a little bit more.

CCisGG · 2023-09-05T19:49:25Z

Tested with Nurse disabled (broker not brought back immediately after hard-kill):

Turns out the produce latency can be still high. The reason should be DelayedProduce is not tried to complete, exactly like what Huilin mentioned. So essentially maybeIncrementWatermark should always followed with tryCompleteDelayedRequests, otherwise the new HWM may not help with the completing the DelayedProduce.

Will consider to open another PR by bringing up open source change.

CCisGG · 2023-09-08T02:46:39Z

With the new updates (tryCompleteDelayedRequests after increment HW), the max latency is now capped at ~15 seconds even if nurse does not bring the dead broker back immediately.

core/src/main/scala/kafka/cluster/Partition.scala

groelofs

One cosmetic nit, but fix looks solid. Thanks!

core/src/main/scala/kafka/cluster/Partition.scala

Trigger maybeIncrementLeaderHW in the alterISR request callback

636dedf

CCisGG requested review from groelofs, hshi2022 and lmr3796 September 1, 2023 20:02

groelofs approved these changes Sep 1, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved

hshi2022 reviewed Sep 1, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved

lmr3796 suggested changes Sep 2, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved

CCisGG added 3 commits September 6, 2023 10:01

Invoke tryCompleteDelayedRequest outside the isr lock if hwm incremented

82ec61c

Doing minor refactor

dad914c

Use foreach instead of exists for Option

ede787a

lmr3796 suggested changes Sep 8, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Show resolved Hide resolved

CCisGG force-pushed the 20230901_fix_high_produce_latency branch 5 times, most recently from 1f1b63b to 74630f7 Compare September 8, 2023 19:04

groelofs approved these changes Sep 8, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Show resolved Hide resolved

hshi2022 reviewed Sep 8, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved

CCisGG force-pushed the 20230901_fix_high_produce_latency branch from 74630f7 to ede787a Compare September 8, 2023 23:00

lmr3796 suggested changes Sep 9, 2023

View reviewed changes

core/src/main/scala/kafka/cluster/Partition.scala Show resolved Hide resolved

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved

CCisGG added 2 commits September 10, 2023 21:07

Address Joseph

192fa7b

Use try-catch instead of try-finally

3586526

lmr3796 approved these changes Sep 11, 2023

View reviewed changes

Fixed the compile failure

8e3f153

lmr3796 approved these changes Sep 11, 2023

View reviewed changes

CCisGG merged commit d144fda into 3.0-li Sep 11, 2023
25 checks passed

CCisGG deleted the 20230901_fix_high_produce_latency branch September 11, 2023 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger maybeIncrementLeaderHW in the alterISR request callback #477

Trigger maybeIncrementLeaderHW in the alterISR request callback #477

CCisGG commented Sep 1, 2023 •

edited

Loading

groelofs left a comment

CCisGG commented Sep 1, 2023 •

edited

Loading

lmr3796 commented Sep 2, 2023 •

edited

Loading

groelofs commented Sep 2, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 8, 2023

groelofs left a comment

Trigger maybeIncrementLeaderHW in the alterISR request callback #477

Trigger maybeIncrementLeaderHW in the alterISR request callback #477

Conversation

CCisGG commented Sep 1, 2023 • edited Loading

groelofs left a comment

Choose a reason for hiding this comment

CCisGG commented Sep 1, 2023 • edited Loading

lmr3796 commented Sep 2, 2023 • edited Loading

groelofs commented Sep 2, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 5, 2023

CCisGG commented Sep 8, 2023

groelofs left a comment

Choose a reason for hiding this comment

CCisGG commented Sep 1, 2023 •

edited

Loading

CCisGG commented Sep 1, 2023 •

edited

Loading

lmr3796 commented Sep 2, 2023 •

edited

Loading