-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-2132. Revert RATIS-2099 due to its performance regression #1126
Conversation
@duongkame , instead of reverting this, how about replacing the guava Cache with a diff --git a/ratis-server-api/src/main/java/org/apache/ratis/server/protocol/TermIndex.java b/ratis-server-api/src/main/java/org/apache/ratis/server/protocol/TermIndex.java
index a8aa670613..e3b8475fe0 100644
--- a/ratis-server-api/src/main/java/org/apache/ratis/server/protocol/TermIndex.java
+++ b/ratis-server-api/src/main/java/org/apache/ratis/server/protocol/TermIndex.java
@@ -19,23 +19,23 @@ package org.apache.ratis.server.protocol;
import org.apache.ratis.proto.RaftProtos.LogEntryProto;
import org.apache.ratis.proto.RaftProtos.TermIndexProto;
-import org.apache.ratis.thirdparty.com.google.common.cache.Cache;
-import org.apache.ratis.thirdparty.com.google.common.cache.CacheBuilder;
+import java.lang.ref.WeakReference;
import java.util.Comparator;
import java.util.Optional;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.TimeUnit;
+import java.util.WeakHashMap;
/** The term and the log index defined in the Raft consensus algorithm. */
public interface TermIndex extends Comparable<TermIndex> {
class Util {
/** An LRU Cache for {@link TermIndex} instances */
- private static final Cache<TermIndex, TermIndex> CACHE = CacheBuilder.newBuilder()
- .maximumSize(1 << 16)
- .expireAfterAccess(1, TimeUnit.MINUTES)
- .build();
+ private static final WeakHashMap<TermIndex, WeakReference<TermIndex>> CACHE = new WeakHashMap<>();
+
+ private static synchronized TermIndex putIfAbsent(TermIndex termIndex) {
+ return CACHE.computeIfAbsent(termIndex, WeakReference::new).get();
+ }
}
+
TermIndex[] EMPTY_ARRAY = {};
/** @return the term. */
@@ -109,10 +109,6 @@ public interface TermIndex extends Comparable<TermIndex> {
return String.format("(t:%s, i:%s)", longToString(term), longToString(index));
}
};
- try {
- return Util.CACHE.get(key, () -> key);
- } catch (ExecutionException e) {
- throw new IllegalStateException("Failed to valueOf(" + term + ", " + index + "), key=" + key, e);
- }
+ return Util.putIfAbsent(key);
}
}
\ No newline at end of file |
I don't think we need any cache here. @szetszwo . You already create the TermIndex instance and throw it away to favor the one in the cache. How does it help? |
As reported by @jojochuang in RATIS-2099, there were a lot of
Since the instance is short live, it will be gc'ed as young gen instead of promoting to old gen. |
I think the problem at the first place is there are too many annoymous classes not being GCed, not sure if it will be better if we use normal class instead of annoymous class. |
It won't help. They are the same in the sense of gc. Anonymous classes just don't have a name. |
We need to identify the source of them. I believe the TermIndex instances we cache are in the RaftLog cache. And in that cache, TermIndex (term, index) instances are unique. How does deduplication/interning help to reduce the number of cached TermIndex instances? |
@jojochuang , would you be able to reproduce it? It would be great if we can test the deduplication. |
I tried to analyze the a local heap, but failed to reproduce the result mentioned by @jojochuang .
I think the following points are interesting:
Tested by running "ozone freon ommg --operation CREATE_FILE -n 30000 -t 1 --size=100 --bucket=bucket-1 --type RATIS --replication THREE" on master branch of Ozone, heap analyzed from datanode. |
Let's just revert the change by this PR and reexamine the problem with TermIndex instances somewhere else. The change is made prematurely, without due diligence. For such memory, we must.
Without doing the above 2 things, we will likely introduce a new problem without solving the original, just like we did. Guessing and fixing and not verifying never turns out well. |
+1 to revert. IMO the initial problem in the original jira wasn't as a huge issue as the problem it introduces. |
@jojochuang , thanks for the confirmation. Let's revert it. @duongkame , let's use the |
Reverted.
|
What changes were proposed in this pull request?
See RATIS-2132
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2132
Please replace this section with the link to the Apache JIRA)
How was this patch tested?
CI.