Application performance is on the forefront of our minds, and Garbage Collection optimization is a good place to make small, but meaningful advancements
Automated garbage collection (along with the JIT HotSpot Compiler) is one of the most advanced and most valued components of the JVM, but many developers and engineers are far less familiar with Garbage Collection (GC), how it works and how it impacts application performance.
First, what is GC even for? Garbage collection is the memory management process for objects in the heap. As objects are allocated to the heap, they run through a few collection phases – usually rather quickly as the majority of objects in the heap have short lifespans.
Garbage collection events contain three phases – marking, deletion and copy/compaction. In the first phase, the GC runs through the heap and marks everything either as live (referenced) objects, unreferenced objects or available memory space. Unreferenced objects are then deleted, and remaining objects are compacted. In generational garbage collections, objects “age” and are promoted through 3 spaces in their lives – Eden, Survivor space and Tenured (Old) space. Shifting also occurs as a part of the compaction phase.
But enough about that, let’s get to the fun part!
Getting to Know Garbage Collection (GC) in Java
One of the great things about automated GC is that developers don’t really need to understand how it works. Unfortunately, that means that many developers DON’T understand how it works. Understanding garbage collection and the many available GCs, is somewhat like knowing Linux CLI commands. You don’t technically need to use them, but knowing and becoming comfortable using them can have a significant impact on your productivity.
Just like with CLI commands, there are the absolute basics. ls command to view a list of folders within a parent folder, mv to move a file from one location to another, etc. In GC, those kinds of commands would be equivalent to knowing that there is more than one GC to choose from, and that GC can cause performance concerns. Of course, there is so much more to learn (about using the Linux CLI AND about garbage collection).
The purpose of learning about Java’s garbage collection process isn’t just for gratuitous (and boring) conversation starters, the purpose is to learn how to effectively implement and maintain the right GC with optimal performance for your specific environment. Knowing that garbage collection affects application performance is basic, and there are many advanced techniques for enhancing GC performance and reducing its impact on application reliability.
GC Performance Concerns
1. Memory Leaks –
With knowledge of heap structure and how garbage collection is performed, we know that the memory usage gradually increases until a garbage collection event occurs and the usage drops back down. Heap utilization for referenced objects usually remains steady so the drop should be to more or less the same volume.
With a memory leak, each GC event clears a smaller portion of heap objects (although many objects left behind are not in use) so heap utilization will continue to increase until the heap memory is full and an OutOfMemoryError exception will be thrown. The cause for this is that the GC only marks unreferenced objects for deletion. So, even if a referenced object is no longer in use, it won’t be cleared from the heap. There are some helpful coding tricks for preventing this that we’ll cover a bit later.
2. Continuous “Stop the World” Events –
In some scenarios, garbage collection can call a Stop the World event because when it occurs, all threads in the JVM (and thus, the application that’s running on it) are stopped to allow GC to execute. In healthy applications, GC execution time is relatively low and doesn’t have a large effect on application performance.
In suboptimal situations, however, Stop the World events can greatly impact the performance and reliability of an application. If a GC event requires a Stop the World pause and takes 2 seconds to execute, the end-user of that application will experience a 2 second delay as the threads running the application are stopped to allow GC.
When memory leaks occur, continuous Stop the World events are also problematic. As less heap memory space is purged with every execution of the GC, it takes less time for the remaining memory to fill up. When the memory is full, the JVM triggers another GC event. Eventually, the JVM will be running repeated Stop the World events causing major performance concerns.
3. CPU Usage –
And it all comes down to CPU usage. A major symptom of continuous GC / Stop the World events is a spike in CPU usage. GC is a computationally heavy operation, and so can take more than its fair share of CPU power. For GCs that run concurrent threads, CPU usage can be even higher. Choosing the right GC for your application will have the biggest impact on CPU usage, but there are also other ways to optimize for better performance in this area.
We can understand from these performance concerns surrounding garbage collection that however advanced GCs get (and they’re getting pretty advanced), their Achilles’ heel remains the same. Redundant and unpredictable object allocations. To improve application performance, choosing the right GC isn’t enough. We need to know how the process works, and we need to optimize our code so that our GCs don’t pull excessive resources or cause excessive pauses in our application.
Before we dive into the different Java GCs and their performance impact, it’s important to understand the basics of generational garbage collection. The basic concept of generational GC is based on the idea that the longer a reference exists to an object in the heap, the less likely it is to be marked for deletion. By tagging objects with a figurative “age,” they could be separated into different storage spaces to be marked for deletion by the GC less frequently.
When an object is allocated to the heap, it’s placed in what’s called the Eden space. That’s where the objects start out, and in most cases that’s where they are marked for deletion. Objects that survive that stage “celebrate a birthday” and are copied to the Survivor space. This process is shown below:
The Eden and Survivor spaces make up what’s called the Young Generation. This is where the bulk of the action occurs. When (If) an object in the Young Generation reaches a certain age, it is promoted to the Tenured (also called Old) space. The benefit to dividing Object memories based on age is that the GC can operate at different levels.
A Minor GC is a collection that focuses only on the Young Generation, effectively ignoring the Tenured space altogether. Generally, the majority of Objects in the Young Generation are marked for deletion and a Major or Full GC (including the Old Generation) isn’t necessary to free memory on the heap. Of course a Major or Full GC will be triggered when necessary.
One quick trick for optimizing GC operation based on this is to adjust the sizes of heap areas to best fit your applications’ needs.
There are many available GCs to choose from, and although G1 became the default GC in Java 9, it was originally intended to replace the CMS collector which is Low Pause, so applications running with Throughput collectors may be better suited staying with their current collector. Understanding the operational differences, and the differences in performance impact, for Java garbage collectors is still important.
The serial collector is the simplest one, and the one you’re least likely to be using, as it’s mainly designed for single-threaded environments (e.g. 32-bit or Windows) and for small heaps. This collector can vertically scale memory usage in the JVM but requires several Major/Full GCs to release unused heap resources. This causes frequent Stop the World pauses, which disqualifies it for all intents and purposes from being used in user-facing environments.
As its name describes, this GC uses multiple threads running in parallel to scan through and compact the heap. Although the Parallel GC uses multiple threads for garbage collection, it still pauses all application threads while running. The Parallel collector is best suited for apps that need to be optimized for best throughput (throughput collector) and can tolerate higher latency in exchange.
Low Pause Collectors
Most user-facing applications will require a low pause GC, so that user experience isn’t affected by long or frequent pauses. These GCs are all about optimizing for responsiveness (time/event) and strong short-term performance.
Concurrent Mark Sweep (CMS) –
Similar to the Parallel collector, the Concurrent Mark Sweep (CMS) collector utilizes multiple threads to mark and sweep (remove) unreferenced objects. However, this GC only initiates Stop the World events only in two specific instances:
(1) when initializing the initial marking of roots (objects in the old generation that are reachable from thread entry points or static variables) or any references from the main() method, and a few more
(2) when the application has changed the state of the heap while the algorithm was running concurrently, forcing it to go back and do some final touches to make sure it has the right objects marked
The Garbage first collector (commonly known as G1) utilizes multiple background threads to scan through the heap that it divides into regions. It works by scanning those regions that contain the most garbage objects so that it can reclaim them first, giving it its name (Garbage first).
This strategy reduces the chance of the heap being depleted before background threads have finished scanning for unused objects, in which case the collector would have to stop the application. Another advantage for the G1 collector is that it compacts the heap on-the-go, something the CMS collector only does during full Stop the World collections.
Improving GC Performance
Application performance is directly impacted by the frequency and duration of garbage collections, meaning that optimization of the GC process is done by reducing those metrics. There are two major ways to do this. First, by adjusting the heap sizes of young and old generations, and second, to reduce the rate of object allocation and promotion.
In terms of adjusting heap sizes, it’s not as straightforward as one might expect. The logical conclusion would be that increasing the heap size would decrease GC frequency while increasing duration, and decreasing the heap size would decrease GC duration while increasing frequency.
The fact of the matter, though, is that the duration of a Minor GC is reliant not on the size of the heap, but on the number of objects that survive the collection. That means that for applications that mostly create short-lived objects, increasing the size of the young generation can actually reduce both GC duration and frequency. However, if increasing the size of the young generation will lead to a significant increase in objects needing to be copied in survivor spaces, GC pauses will take longer leading to increased latency.
3 Tips for Writing GC-Efficient Code
Tip #1: Predict Collection Capacities –
All standard Java collections, as well as most custom and extended implementations (such as Trove and Google’s Guava), use underlying arrays (either primitive- or object-based). Since arrays are immutable in size once allocated, adding items to a collection may in many cases cause an old underlying array to be dropped in favor of a larger newly-allocated array.
Most collection implementations try to optimize this reallocation process and keep it to an amortized minimum, even if the expected size of the collection is not provided. However, the best results can be achieved by providing the collection with its expected size upon construction.
Tip #2: Process Streams Directly –
When processing streams of data, such as data read from files, or data downloaded over the network, for example, it’s very common to see something along the lines of:
The resulting byte array could then be parsed into an XML document, JSON object or Protocol Buffer message, to name a few popular options.
When dealing with large files or ones of unpredictable size, this is obviously a bad idea, as it exposes us to OutOfMemoryErrors in case the JVM can’t actually allocate a buffer the size of the whole file.
A better way to approach this is to use the appropriate InputStream (FileInputStream in this case) and feed it directly into the parser, without first reading the whole thing into a byte array. All major libraries expose APIs to parse streams directly, for example:
Tip #3: Use Immutable Objects –
Immutability has many advantages. One that’s rarely given the attention it deserves is its effect on garbage collection.
An immutable object is an object whose fields (and specifically non-primitive fields in our case) cannot be modified after the object has been constructed.
Immutability implies that all objects referenced by an immutable container have been created before the construction of the container completes. In GC terms: The container is at least as young as the youngest reference it holds. This means that when performing garbage collection cycles on young generations, the GC can skip immutable objects that lie in older generations, since it knows for sure they cannot reference anything in the generation that’s being collected.
Less objects to scan mean less memory pages to scan, and less memory pages to scan mean shorter GC cycles, which mean shorter GC pauses and better overall throughput.
Caveat, immutable objects in and of themselves are not going to help you with GC. If an application is having to constantly create new immutable objects, it can in turn be running longer GC cycles because of the constant object creation. Be mindful of how the application will be using immutable objects in hopes of them being promoted to the Tenured heap space.