Featured image for Java topics.

Garbage collection (GC) performs dynamic memory management in many modern programming languages. For developers, sophisticated garbage collection lightens the load of worrying about memory management. This article is the third in a four-part series that explains how to improve Java performance by choosing and tuning a garbage collector.

Part 1 explained the basics of garbage collection and how to monitor the stages and levels of garbage collection. Part 2 delved into memory usage by the Java Virtual Machine (JVM) and the compiler options that control it. This article compares Java garbage collectors and explains how to use your application's throughput, latency, and footprint requirements to choose the right one for your needs.

Choosing a garbage collector

Applications allocate and free memory dynamically as they define and use variables. In Java, the JVM allocates memory from the operating system and provides it to the application per requests for new variables. Garbage collection, running in one or more background threads, determines which parts of memory are still referenced by the application and reclaims unreferenced memory for application reuse.

Java offers many garbage collectors to meet different application needs. Choosing the right garbage collector for your application majorly impacts its performance. The essential criteria are:

  • Throughput: The percentage of total time spent in useful application activity versus memory allocation and garbage collection. For example, if your throughput is 95%, that means the application code is running 95% of the time and garbage collection is running 5% of the time. You want higher throughput for any high-load business application.
  • Latency: Application responsiveness, which is affected by garbage collection pauses. In any application interacting with a human or some active process (such as a valve in a factory), you want the lowest possible latency.
  • Footprint: The working set of a process, measured in pages and cache lines.

Different users and applications have different requirements. Some want higher throughput and can bear longer latencies in exchange, whereas others need low latency because even very short pause times would negatively impact their user experience. On systems with limited physical memory or many processes, the footprint might dictate scalability. In the next sections, we'll use these application requirements to discuss and compare the following garbage collectors:

  • Serial collector
  • Parallel collector
  • Garbage-first (G1) collector
  • Z collector
  • Shenandoah collector
  • Concurrent Mark Sweep (CMS) collector (deprecated)

Serial collector

This garbage collector performs all its work on a single thread. Using a single thread can improve efficiency because there is no communication overhead between multiple threads.

A serial collector is best suited for single-processor machines because multiprocessor machines can benefit from multiple threads. It is also possible to use the serial collector on multiprocessor machines for applications with small data sets. This collector may be the best choice for applications that can tolerate pauses and that create very small heaps.

The serial collector is a generational garbage collector. As explained in Part 1 of this series, a generation is a set of objects of a similar age. A generational garbage collector divides the set of all objects into generations and collects all the objects in one or more generations in a single pass.

Enabling the serial collector: -XX:+UseSerialGC

The serial collector is selected by default on certain hardware and operating system configurations, and you can explicitly enable the collector with the -XX:+UseSerialGC compiler option.

Parallel collector

The parallel collector is also known as the throughput collector because it is often the best choice when throughput is more important than latency. You can use the parallel collector when long pauses are acceptable, such as bulk data processing, batch jobs, etc.

The parallel collector, like the serial collector, is a generational garbage collector. The main difference between them is that the parallel collector runs multiple threads to speed up garbage collection.

If the application requirement is to achieve the highest throughput and if pauses of one second or more are acceptable, a parallel collector might be appropriate. The parallel collector can be used for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded machines.

Enabling the parallel collector: -XX:+UseParallelGC

Use the -XX:+UseParallelGC option to enable this collector. The parallel collector also lets you configure several of its parameters through additional compiler options:

  • -XX:ParallelGCThreads=n specifies the number of garbage collector threads.
  • -XX:MaxGCPauseMillis=n specifies the goal for the maximum pause time in milliseconds. By default, there is no limit on pause time, but with this option, pause times of n or fewer milliseconds are expected.
  • -XX:GCTimeRatio=n helps achieve the application's throughput goal. This option sets the amount of time devoted to garbage collection in a 1/(1+n) ratio. For instance, -XX:GCTimeRatio=24 sets a goal of 1/25, so 4% of the total time is spent in garbage collection. The default value is 99, which results in 1% time spent in garbage collection.

See the Java documentation for more details about the parallel collector.

Garbage-first (G1) collector

G1 is a server-style collector designed for multiprocessor machines with a large amount of memory. The collector tries to achieve high throughput along with short pause times, while requiring very little tuning. G1 is selected by default on certain hardware and operating systems, and can be explicitly enabled through the -XX:+UseG1GC option.

G1 is called a mostly concurrent collector because it performs some expensive work concurrently with the application. G1 is also a regionalized and generational garbage collector, which means that the heap is divided into a number of equally sized regions. Upon startup, the JVM sets the region size, which can vary from 1MB to 32MB depending on the heap size. The goal is to have no more than 2048 regions. The Eden, survivor, and old generations (described in Part 1 of this series) are logical sets of these regions and are not contiguous.

The G1 collector can achieve high throughput and low latency for applications that meet one or more of the following criteria:

  • Large heap size: Specifically, more than 6GB where more than 50% is occupied by live objects.
  • Rates of allocation and promotion between garbage collection generations that may vary significantly during the application's run.
  • A large amount of fragmentation in the heap.
  • The need to limit pauses to a few hundred milliseconds.

String deduplication: -XX:+UseStringDeduplication

Starting with JDK 8 update 20, the G1 collector provides another optimization through string deduplication, which could decrease the application's heap use by about 10%. The -XX:+UseStringDeduplication compiler option causes the G1 collector to find duplicate strings and keep a single active reference to one string while performing garbage collection on the duplicates. No other Java garbage collector currently supports string deduplication.

I suggest that you run your application with these options in a test environment to see whether they achieve a reduction in memory usage, and then enable the options in production.

Additional G1 compiler options

Here is a summary of options associated with the G1 collector:

  • -XX:+UseG1GC enables the G1 garbage collector.
  • -XX:+UseStringDeduplication enables string deduplication.
  • -XX:+PrintStringDeduplicationStatistics prints detailed duplication statistics, if run with the previous option.
  • -XX:StringDeduplicationAgeThreshold=n causes string objects reaching the age of n garbage collection cycles to be considered candidates for deduplication. The default value is 3.

For more about the G1 garbage collector, please refer to Introduction to the G1 Garbage Collector and Collecting and reading G1 garbage collector logs. You can also read G1 Collector tuning for G1 performance improvement recommendations.

Z Garbage Collector (ZGC)

ZGC is a low-latency garbage collector that works well with very large (multi-terabyte) heaps. Like G1, ZGC works concurrently with the application. ZGC is concurrent, single-generation, region-based, NUMA-aware, and compacting. It does not stop the execution of application threads for more than 10ms.

This collector is suitable for applications with very large amounts of memory that require very short pause times. The Z Garbage Collector is available as an experimental feature and is enabled with the -XX:+UnlockExperimentalVMOptions -XX:+UseZGC command-line options.

Setting a maximum heap size is very important when using ZGC, because the collector's behavior depends on allocation rate variance and how much of the data set is live. ZGC works better with a larger heap, but wasting unnecessary memory is also inefficient, so you need to tune your balance between memory usage and the resources available for garbage collection.

Concurrent GC threads in ZGC

The number of concurrent garbage collection threads is also an important value to tune with ZGC. You can set the number of concurrent GC through the XX:ConcGCThreads=n compiler option. This parameter determines how much CPU time is given to the garbage collector. By default, ZGC automatically selects how many threads to run, which works for some applications but needs to be tuned for others. Specifying too many threads ends up using a lot of CPU, whereas specifying too few threads causes garbage to be created faster than it can be collected.

Shenandoah collector

Shenandoah is another garbage collector with very short pause times. It reduces pause times by performing more garbage collection work concurrently with the application, including concurrent compaction. Shenandoah's pause time is independent of the heap size. Garbage collecting a 2GB heap or a 200GB heap should have a similar short pause behavior.

Shenandoah is best suited to an application that needs responsiveness and short pause times, irrespective of heap size requirements. You can enable this collector through the -XX:+UseShenandoahGC compiler option.

Concurrent Mark Sweep collector (deprecated)

The Concurrent Mark Sweep (CMS) collector is deprecated as of JDK 9 (discussed in two JDK enhancement proposals, JEP-291 and JEP-363), with the recommendation to use the G1 collector instead.

The CMS collector has been preferred in applications that require short garbage collection pause times and that can share the processor resources with the garbage collector while the application is running. This collector offers more benefit when long-lived tenured generation is high and the application is running on a machine with two or more available processors. The CMS collector can be enabled with the -XX:+UseConcMarkSweepGC compiler option.

CMS is a generational garbage collector, collecting the tenured generations. By performing garbage collection—notably, mark-and-sweep operations—concurrently with the application thread, it ensures short pause times in the application. However, if the CMS collector is unable to clear the unreferenced objects before the old generation fills up, or if object allocation cannot be satisfied with available space in the old generation, CMS stops all the application threads to perform the garbage collection. The state in which the CMS garbage collector could not complete garbage collection concurrently is known as concurrent mode failure and indicates the importance of tuning the parameters for the CMS collector.

When CMS throws an OutOfMemoryError

If more than 98% of the application's total time is spent in garbage collection and less than 2% of the heap is recovered during five consecutive garbage collection cycles, CMS throws an OutOfMemoryError error. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If needed, you can disable the error by adding the option -XX:-UseGCOverheadLimit to the command line.

CMS was deprecated to accelerate the development of other garbage collectors in HotSpot. Eliminating CMS will reduce the maintenance burden of the GC code base and accelerate new development. Therefore, using CMS via the -XX:+UseConcMarkSweepGC option in JDK 9 results in the following warning message:

Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option UseConcMarkSweepGC; \

support was removed in <version>

The G1 garbage collector is intended, in the long term, to replace most uses of the Concurrent Mark Sweep collector. The newer Z and Shenandoah collectors can be also used with the latest JDK instead of CMS. If none of these collectors works for your application requirements, you can still use Concurrent Mark Sweep as long as it remains supported in earlier releases.

Conclusion

Choosing the right garbage collector depends heavily on your application’s requirements and its behavior. This article has contrasted six Java garbage collectors based on throughput, latency, and footprint. You can use this information to choose the garbage collector best suited for your applications. There are many garbage collectors available, so you need to make a choice after proper testing with your expected production load.

Comments