-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] HeapSizeEstimator #1281
base: main
Are you sure you want to change the base?
[test] HeapSizeEstimator #1281
Conversation
Have you got a chance to take a look at |
/** | ||
* Utility class to help in implementing {@link Measurable#getSize()}. A couple of important points: | ||
* | ||
* 1. This utility class does not "measure" the heap size, but rather attempts to "predict" it, based on knowledge of | ||
* the internals of the JVM. If any of the assumptions are wrong, then of course the results will be inaccurate. | ||
* 2. This utility class assumes we are using the HotSpot JVM. | ||
*/ | ||
public class HeapSizeEstimator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a thread to respond to @gaojieliu's general comment:
Have you got a chance to take a look at openjdk/jol implementation?
https://github.com/openjdk/jol
It might carry some issues, but I guess we can analyze the strategy this lib is using to see whether we can borrow some or not.
I posted this work on social media and received a lot of interesting examples from folks about other open source projects with the same or similar purpose. I'll maintain a list of these here:
- https://github.com/openjdk/jol
- https://github.com/ehcache/sizeof
- https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
For now, this work looks pretty similar to (at least some parts of) the above, but I think there is still potential to have a good implementation as part of Venice. In particular, I want a solution in which there is no hot path reflection. For now, this tradeoff is not apparent because I have not yet included examples of how we can leverage this utility in the rest of our code, but I will add it soon when the basic functionality is stabilized...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reflection
mentioned here is mainly about performance, right?
Have you tried these alternatives and run it with JMH?
I tried jol
and the initialization is slow, but after that, it is fairly fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a small test against jol
and it seems it can work properly with Oracle OpenJDK8 release, but not MSFT OpenJDK11 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-dubrouski
Do you know why MSFT-jdk11 won't work well with attached agent to measure the object size?
I tried to add this jvm arg, but it still doesn't work:
-Djdk.attach.allowAttachSelf=true
And both jol
and sizeof
are suffering from the same issue with JDK11 as they are using the similar technologies..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check a bit later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out that msft-jdk11.0.8/11.0.13 has a bug, which triggered the crash while using 'jol' or 'sizeof' lib and JDK17 apparently fixed this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, Felix, that's why we should probably use JOL :D ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But where is the learning value in that, @alex-dubrouski :D ?
Besides, this memory estimation code needs to run in DVC, which is one of our client libs, meaning we cannot tightly control which JVM it's going to run in. Ideally, it should not depend on agents or finicky Unsafe
code...
That being said, using JOL in unit test to validate the accuracy of this utility might be a good idea 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW... you did not answer my question @alex-dubrouski 😅 but I found out that the hashIsZero
was added 5 years ago: openjdk/jdk@89a267c
According to JDK-8221836, this change got released in Java 13, which is weird because I thought I was looking at the Java 17 source code in my IDE, but apparently not; it was probably hanging on to the source of an older Java version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And indeed, I can get it to show me the proper source and see that it's there now.
94ff779
to
626a47d
Compare
|
||
/** Deal with alignment by rounding up to the nearest alignment boundary. */ | ||
private static int roundUpToNearestAlignment(int size) { | ||
int partialAlignmentWindowUsage = size % ALIGNMENT_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment could be modified with command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it can be configured at start time. But are you saying it can be modified at run time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can't be changed at runtime, but you defined static alignment which might not be accurate. That's an edge case of course, I was just saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see... you mean that alignment can be overridden by a JVM option, in which case it wouldn't decide just based on whether it's a 32 bits or 64 bits VM?
Is it via -XX:ObjectAlignmentInBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is set statically at bootstrap, but code does not take this into account. Feel free to ignore that, since this is very rare edge case.
* {@link Class#getDeclaredFields()} returns the fields (of all visibility, from private to public and everything | ||
* in between) of the class, but not of its parent class. | ||
*/ | ||
for (Field f: c.getDeclaredFields()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the fields are not visible. For example there are native references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point me to a class (perhaps from the JDK) that has a native reference in it? I'd like to test that scenario and see what happens...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java.lang.Object :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'm definitely not taking it into account 😅 ... where can I learn more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex Shipilev explained that in one of the videos (video is not in English...)
internal/venice-common/src/main/java/com/linkedin/venice/memory/HeapSizeEstimator.java
Outdated
Show resolved
Hide resolved
c3fda54
to
16ae593
Compare
Introduced a new HeapSizeEstimator utility to make our on-heap memory usage assessment more accurate, and easier to maintain as class hierarchies evolve. For now it is only used in a unit test, not yet in the main code.
… measures the allocated memory.
16ae593
to
c39e456
Compare
* @throws {@link StackOverflowError} It should be noted that this function is recursive in nature and so any class | ||
* which contains a loop in its class graph will cause a stack overflow. | ||
*/ | ||
public static <T> int getClassOverhead(Class<T> c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is mainly calculating the class overhead, not the deep size, right?
If the object contains a nested hashmap, this class will only measure the overhead of the class, and the user needs to manually add the class overhead on top of the hashmap
cost, which needs to be calculated manually?
I think it will be useful to have a function to measure both class overhead and the actual data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above requirement is mainly about measuring the deep size of the object.
If we do want to implement such feature, it will be great if we can have some filtering logic to ignore certain types of fields, which can be referring to some common references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this function does not measure the deep size of an object, and that is also the reason why it takes a Class
param and not an Object
. Essentially, my idea is to hand-code the "variable" part of the measurement of the classes of interest, and the function in this utility is just to make it easier to have the "base cost" (the non-variable part). This utility is not intended to be called in the hot path.
I will integrate the usage of this utility into the MemoryBoundBlockingQueue
measurement code. I have not done so yet because I want to be extra sure the utility is as accurate as possible (which it may or may not be, as I am not sure if I have chased all of the bugs properly yet, but at least it should be pretty close...).
Introduced a new HeapSizeEstimator utility to make our on-heap memory usage assessment more accurate, and easier to maintain as class hierarchies evolve.
For now it is only used in a unit test, not yet in the main code.
How was this PR tested?
New unit tests.
Does this PR introduce any user-facing changes?