Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-2129. Low replication performance because of lock contention on RaftLog #1127

Closed
wants to merge 3 commits into from

Conversation

szetszwo
Copy link
Contributor

See RATIS-2129

(Temporally targeting to the release-3.1.0 branch. Will submit a pr for the master once this has passed all the tests.)

Copy link
Contributor

@OneSizeFitsQuorum OneSizeFitsQuorum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome design!

It seems that through this solution, we can convert the read lock of SegmentedRaftLog to the read lock of SegmentedRaftLogCache. This change means that the read-write conflicts caused by each write operation from the writing thread are now limited to when switching openSegment in SegmentedRaftLogCache. This approach appears to significantly reduce the overhead of synchronization locks.

Looking forward to future test result!

@OneSizeFitsQuorum
Copy link
Contributor

OneSizeFitsQuorum commented Aug 19, 2024

Hi, @szetszwo
We have tested this pull request, the write throughput is improved by about 25%.

Before:
image
After:
image

We analyzed the flame graph and found that 15% of the Raftlog read locks had disappeared, which made acquiring raftlog locks faster in the write path, resulting in better write performance.This indicates that the write throughput improvement from this PR is still substantial.

Before:
image
After:
image

In addition, we found that UpdateCommitIndex also takes a lot of CPU in the latest optimized scenario. It should be beneficial to continue to optimize the locking logic here.

@szetszwo
Copy link
Contributor Author

@OneSizeFitsQuorum , thanks a lot for testing this!

@szetszwo szetszwo changed the title RATIS-2129. Low replication performance because GrpcLogAppender is often blocked by RaftLog's readLock. RATIS-2129. Low replication performance because of lock contention on RaftLog Aug 29, 2024
@szetszwo
Copy link
Contributor Author

Submitted #1141 for the master branch.

@OneSizeFitsQuorum
Copy link
Contributor

Hi, @szetszwo , we can close this PR because the release-3.1.x will not contain this PR and we can include this work in the future 3.2.0.

@szetszwo
Copy link
Contributor Author

szetszwo commented Sep 4, 2024

Sure, let's close this.

@szetszwo szetszwo closed this Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants