Skip to content

Latest commit

 

History

History
211 lines (131 loc) · 9.82 KB

java-jvm-performance.md

File metadata and controls

211 lines (131 loc) · 9.82 KB

Reference

Books

JVM Performance Engineering: Inside OpenJDK and the HotSpot Java Virtual Machine (Monica Beckwith, 2024)

Java Performance Companion (2020)

Java Performance (Charlie Hunt, Binu John, 2011)

Websites

Java -On Stack Replacement (OSR)

Episode 1: “The Evolution” — Java JIT Hotspot & C2 compilers

Java Garbage Collectors, their working and comparisons

Glossary

  • AOT : Ahead of Time compilation. Aka static compilation, native code compilation happens before execution (eg. GraalVM).
  • CMS : Concurrent Mark and Sweep, GC algorithm
  • JIT : Just in Time compiler. Aka dynamic compilation. native code compilation happens at runtime and is based on hot methods and loop-back branch counts.
  • OSR (link1, link2) : On-stack-replacement. On the fly native code optimization for long loops methods.
  • STW : Stop The World, GC algorithm

History

  • Java 1.2 : new JIT compiler. JVM renamed to Java Hotspot VM.
  • Java 1.6 : c1 and c2 compilers options
  • Java 1.7 : introduction of tiered compilation, G1 (experimental)
  • Java 9 : fine tuned tiered compilation regions (NonMethodCodeHeapSize, NonProfiledCodeHeapSize, ProfiledCodeHeapSize)
  • Java 11 : ZGC
  • Java 12 : Shenandoah GC

JIT compilation

Adaptive optimization :

  • interpretation (slowest form of bytecode execution)
  • adaptive JIT compilation

Bytecode is converted to native code on hotcode paths at runtime. It can also be deoptimized if needed to reclaim

Tiered compilation

Introduced in Java 7, provides multiple levels of optimized compilations, ranging from T0 to T4 :

  • T0: Interpreted code, devoid of compilation. This is where the code starts and then moves on to the T1, T2, or T3 level.
  • T1–T3: Client-compiled mode. T1 is the first step where the method invocation counters and loop-back branch counters are used. At T2, the client compiler includes profiling information, referred to as profile-guided optimization; it may be familiar to readers who are conversant in static compiler optimizations. At the T3 compilation level, completely profiled code can be generated.
  • T4: The highest level of optimization provided by the HotSpot VM’s server compiler.

Tiered compilation has been enabled by default since Java 8.

C1 and C2 compilers

https://faun.pub/episode-1-the-evolution-java-jit-hotspot-c2-compilers-building-super-optimum-containers-f0db19e6f19a

The HotSpot VM provides two flavors of compilers: the fast client compiler (also known as the C1 compiler) and the server compiler (also known as the C2 compiler).

  • in Java 6, we have an option to use either C1 or C2 methods (with a command-line argument -client (for C1), -server (for C2))
  • in Java 7, we could use both
  • in Java 8, that was the default

Code cache

Storage area for native code generated by the JIT compiler or the interpreter.
The template table and profiling measures gathered by adaptive compilation are stored there too.

-XX:ReservedCodeCacheSize option allows to adjust code cache size (48 MB before Java 7, 240 MB after Java 7)

Java 9+ introduces code cache regions for better fine tuning management :

  • Non-method code heap : -XX:NonMethodCodeHeapSize
  • Non-profiled nmethod code heap : -XX:NonProfiledCodeHeapSize
  • Profiled nmethod code heap : -XX:ProfiledCodeHeapSize

Template table

Used by the interpreter to look up the native code sequence for each bytecode.

Hotspot VM command line options

–XX:+PrintCompilation.

To better understand adaptive optimization.

Here are a few examples of the output of the –XX:+PrintCompilation option:

567 693 % ! 3 org.h2.command.dml.Insert::insertRows @ 76 (513 bytes) 656 797 n 0 java.lang.Object::clone (native) 779 835 s 4 java.lang.StringBuffer::append (13 bytes)

Columns description :

  1. timestamp in ms since the JVM start
  2. unique ID of the compilation task
  3. Flags indicating certain properties of the method being compiled, such as whether it’s an OSR method (%), whether it’s synchronized (s), whether it has an exception handler (!), whether it’s blocking (b), or whether it’s native (n).
  4. The tiered compilation level, indicating the level of optimization applied to this method.
  5. The fully qualified name of the method being compiled.
  6. For OSR methods, the bytecode index where the compilation started. This is usually the start of a loop.
  7. The size of the method in the bytecode, in bytes.

Garbage collectors

Algorithms

  • Serial
  • Parallel
  • Concurrent Mark and Sweep
  • G1 (experimental in Java 7) : Designed to offer superior performance and predictability compared to its predecessors
  • Epsilon GC (Java 11) : an experimental no-op GC designed to test the performance of applications with minimal GC interference
  • ZGC (Java 11) : A low-latency, scalable GC designed to handle large heaps with minimal pause times
  • Shenandoah (Java 12) : A low-latency, scalable GC designed to handle large heaps with minimal pause times

Characteristics

  • Stop the World : Pause application threads, can result in longer pause time.
  • Concurrent : Minimize pause times (eg. CMS).
  • Incremental compacting algorithms : Introduced with G1 to deal with the fragmentation issue found in CMS.

Three criteria :

  • Responsiveness : refers to the time taken to receive a response from the system after sending a stimulus
  • Throughput : measures the number of operations that can be performed per second on a given system
  • Footprint : can be defined in two ways -> as optimizing the amount of data or objects that can fit into the available space and as removing redundant information to save space

Logging

JVM options

Till Java 8 : -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<filepath>

From Java 9 : -Xlog:gc*:file=<filepath>

-XX:+DisableExplicitGC : Disables System.gc() globally in the JVM.

Analysis tools

GCeasy : https://gceasy.io

GC Viewer : https://github.com/chewiebug/GCViewer

IBM GC & memory visualizer : https://developer.ibm.com/javasdk/tools

HP Jmeter : https://support.hpe.com/hpesc/public/docDisplay?docId=c02905388&docLocale=en_US

Google garbage cat : https://github.com/doctau/garbagecat

Monitoring and troubleshooting

GC Tuning & Troubleshooting Crash Course by Ram Lakshmanan : https://www.youtube.com/watch?v=6G0E4O5yxks

His products and company's blog : \

Micrometrics to forecast application performance: https://blog.gceasy.io/micrometrics-to-forecast-application-performance/

Long GC pauses tips

  • System time greater than user time : not a healthy sign
  • Process swapping : lack of memory

    script to show all process that are being swapped : https://blog.gceasy.io/reduce-long-gc-pauses/

  • Real time > CPU time + Sys time : process is waiting for resources !
  • Background I/O traffic

    How to monitor I/O activity ? sar -d -p 1 'System Activity Report' command reports read/write activity made every 1 second

  • Less GC threads : too many GC threads will consume more CPU and take away resources
  • Wrong ergonomics settings : try measure performance without any JVM flags
  • Disables explicit GC (ie. System.gc() globally in the JVM) : -XX:+DisableExplicitGC

About real time vs sys time vs user time :
https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1/

Options

-XX:ParallelGCThreads=<n>

Define the number of parallel GC threads for GC.

Have an impact on tuning generational GCs like the Parallel GC and G1 GC.
Recent additions like Shenandoah and ZGC, also use multiple GC worker threads and perform garbage collection concurrently with the application threads to minimize pause times.

-XX:ConcGCThreads=<n>

Specify the number of concurrent GC threads for specific GC algorithms that use concurrent collection phases.
This flag is particularly useful for tuning GCs like G1, which performs concurrent work during marking, and Shenandoah and ZGC, which aim to minimize STW pauses by executing concurrent marking, relocation, and compaction.

Weak and soft references

Brian Goatz - IBM articles : Plugging memory leaks with weak references (Wayback machine 2012/03/03

Brian Goatz - IBM articles : Plugging memory leaks with soft references (Wayback machine 2012/03/03)

Native memory

Native memory is the OS memory used by the JVM process, read this blog article that greatly explains how native memory and heap space relates inside the JVM : http://www.trevorsimonton.com/blog/2020/09/09/java-native-memory.html

Meta-space (Java 8+) lives in native memory : https://www.infoq.com/articles/Java-PERMGEN-Removed

Permgen space (Java 8-) was contiguous to the heap space.

To enable native memory tracking (NMT) inside the JVM, add to command line -XX:NativeMemoryTracking=summary.
To get a native memory snapshot, run jcmd <PID> VM.native_memory.