Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Do not throw recursive update exception when producer state recovery failed #1982

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BewareMyPower
Copy link
Collaborator

Motivation

When transaction is enabled, PartitionLog#initialise will recover the state from the local snapshot. It's an asynchronous operation that could fail. In this case, an "recursive update" IllegalStateException will be thrown, which is unexpected.

Suppressed: java.lang.IllegalStateException: Recursive update
    at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1167) ~[?:?]
    at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1552) ~[?:?]
    at io.streamnative.pulsar.handlers.kop.storage.PartitionLogManager.lambda$getLog$0(PartitionLogManager.java:88) ~[?:?]

The reason is that in PartitionLogManager#getLog, logMap.remove is called in the callback of whenComplete, which could be called in the same thread. Then the remove method is just called in the 2nd argument of computeIfAbsent.

Modifications

Store the future of PartitionLog in PartitionLogManager, move the remove call out of the computeIfAbsent in the exceptionally callback of ReplicaManager#getPartitionLog.

…failed

### Motivation

When transaction is enabled, `PartitionLog#initialise` will recover the
state from the local snapshot. It's an asynchronous operation that could
fail. In this case, an "recursive update" `IllegalStateException` will
be thrown, which is unexpected.

```
Suppressed: java.lang.IllegalStateException: Recursive update
    at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1167) ~[?:?]
    at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1552) ~[?:?]
    at io.streamnative.pulsar.handlers.kop.storage.PartitionLogManager.lambda$getLog$0(PartitionLogManager.java:88) ~[?:?]
```

The reason is that in `PartitionLogManager#getLog`, `logMap.remove` is
called in the callback of `whenComplete`, which could be called in the
same thread. Then the `remove` method is just called in the 2nd argument
of `computeIfAbsent`.

https://github.com/streamnative/kop/blob/3602c9e826d903d97091af1cc608b9d88c1b8cf3/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/storage/PartitionLogManager.java#L88

### Modifications

Store the future of `PartitionLog` in `PartitionLogManager`, move the
`remove` call out of the `computeIfAbsent` in the `exceptionally`
callback of `ReplicaManager#getPartitionLog`.
@github-actions
Copy link

@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

1 similar comment
@github-actions
Copy link

@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@github-actions github-actions bot added the doc-info-missing This pr needs to mark a document option in description label Jul 27, 2023
@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Merging #1982 (e648861) into master (e931b6d) will increase coverage by 0.03%.
Report is 2 commits behind head on master.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1982      +/-   ##
============================================
+ Coverage     17.72%   17.75%   +0.03%     
- Complexity      751      752       +1     
============================================
  Files           195      195              
  Lines         14156    14146      -10     
  Branches       1322     1319       -3     
============================================
+ Hits           2509     2512       +3     
+ Misses        11464    11452      -12     
+ Partials        183      182       -1     
Files Changed Coverage Δ
...ative/pulsar/handlers/kop/KafkaRequestHandler.java 1.06% <ø> (+<0.01%) ⬆️
...sar/handlers/kop/storage/AppendRecordsContext.java 0.00% <0.00%> (ø)
...tive/pulsar/handlers/kop/storage/PartitionLog.java 7.93% <ø> (+0.03%) ⬆️
...lsar/handlers/kop/storage/PartitionLogManager.java 0.00% <0.00%> (ø)
...ve/pulsar/handlers/kop/storage/ReplicaManager.java 0.00% <0.00%> (ø)

... and 8 files with indirect coverage changes

@BewareMyPower BewareMyPower marked this pull request as draft July 27, 2023 13:27
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
log.error("Failed to get PartitionLog for {} under {}", topicPartition, namespacePrefix, e.getCause());
Copy link
Contributor

@gaoran10 gaoran10 Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to remove the failed future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed in the exceptionally callback of the future returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a serious problem so that it's draft now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
doc-info-missing This pr needs to mark a document option in description release/2.10.4 release/2.11 release/3.0 type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants