Escape Analysis & Stack Allocated Objects

In 1999, a paper, Escape Analysis For Java was to appear in the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. This paper outlined an algorithm to detect whether an object, O, escaped from the currently executed method, M, or current thread, T. This paper laid the groundwork for the HotSpot JVM Escape Analysis optimization introduced in JDK6u14. As of JDKu21, escape analysis became a JVM default for the C2 compiler.

Should you choose to peruse the Internet, you will find that escape analysis (enabled pre-JDKu21 with the flag, -XX:+DoEscapeAnalysis) tends to increase overall system performance quite substantially. The 1999 paper cites 2% to 23% overall "performance improvement" across several studied systems. Numerous websites show similar (and sometimes far greater) improvements. While I expect escape analysis is likely already enabled in your production environment (and probably to good effect), I would like to explain why you, as a developer, should care.

A Simple Explanation

In prose, the algorithm identifies an object as having one of 3 possible characteristics:

Objects characterised as NoEscape may be allocated on the stack instead of the heap. Specifically, this means that the object requires no garbage collection and disappears once the method's stack frame is popped. While this applies to any object characterised as NoEscape, this will most notably reduce the overhead of instantiating high-churn helper objects such as EqualsBuilder, HashCodeBuilder, StringBuilder, and similar temporary objects such as iterators and collections. Important to note, however, is that objects requiring finalizer execution are considered GlobalEscape and will not be stack-bound.

Objects characterised as ArgEscape cannot be allocated on the stack; however, because they are accessible only by the currently executing thread, monitors on this object may be eliminated. Because the JVM guarantees sequentially consistent execution within a single thread, all memory synchronization semantics are upheld.

Finally, objects identified as GlobalEscape objects are not subject to either stack-based allocation or monitor elimination. GlobalEscape objects remain heap-bound.

A Demonstration

While I expect you, the discerning reader, to performance test your own systems and form your own conclusions as to the impact of escape analysis, I have provided a brief example demonstrating the effects of enabling or disabling escape analysis. Consider an empty class:

public final class EscapeTest
{
    public static void main(final String[] args)
    {
    }
}

Execution of this class, both with and without escape analysis, will provide some baseline numbers of interest:

% java -Xms1G -Xmx1G -XX:+PrintGCDetails -XX:+UseSerialGC -XX:NewRatio=1 -XX:SurvivorRatio=8 -verbose:gc -XX:+DoEscapeAnalysis EscapeTest
Heap
    def new generation   total 471872K, used 16778K
        eden space 419456K,   4% used
        from space 52416K,   0% used
        to   space 52416K,   0% used
     tenured generation   total 524288K, used 0K
         the space 524288K,   0% used
% java -Xms1G -Xmx1G -XX:+PrintGCDetails -XX:+UseSerialGC -XX:NewRatio=1 -XX:SurvivorRatio=8 -verbose:gc -XX:-DoEscapeAnalysis EscapeTest
Heap
    def new generation   total 471872K, used 16778K
        eden space 419456K,   4% used
        from space 52416K,   0% used
        to   space 52416K,   0% used
     tenured generation   total 524288K, used 0K
         the space 524288K,   0% used

The heap size in this example is artificially high in attempts of suppressing garbage collection. I have done this for demonstrative purposes only. Observing the garbage collector details, the empty program exhibits no difference in behaviour despite escape analysis. The next example, however, will highlight the effect of escape analysis when creating 15 million objects:

public final class EscapeTest
{
    private final int a;
    private final int b;

    EscapeTest(final int a, final int b)
    {
        this.a = a;
        this.b = b;
    }

    @Override
    public boolean equals(final Object obj)
    {
        final EscapeTest other = (EscapeTest)obj;
        return new EqualsBuilder()
            .append(this.a, other.a)
            .append(this.b, other.b)
            .isEquals();
    }

    public static void main(final String[] args)
    {
        final Random random = new Random();
        for(int i = 0; i < 5_000_000; i++){
            final EscapeTest t1 = new EscapeTest(random.nextInt(), random.nextInt());
            final EscapeTest t2 = new EscapeTest(random.nextInt(), random.nextInt());
            if(t1.equals(t2)){
                System.out.println("Prevent anything from being optimized out.");
            }
        }
    }
}

Traditionally, the EqualsBuilder and both EscapeTest instances will be allocated on the heap during each of the 5 million loops. With escape analysis enabled, however, we can reason that t1 and the EqualsBuilder, both NoEscape objects, should both be stack allocated:

% java -Xms1G -Xmx1G -XX:+PrintGCDetails -XX:+UseSerialGC -XX:NewRatio=1 -XX:SurvivorRatio=8 -verbose:gc -XX:+DoEscapeAnalysis EscapeTest
Heap
    def new generation   total 471872K, used 16778K
        eden space 419456K,   4% used
        from space 52416K,   0% used
        to   space 52416K,   0% used
     tenured generation   total 524288K, used 0K
         the space 524288K,   0% used

Notice that with escape analysis enabled, the heap footprint is identical to the empty class: only 4% of newgen is used. When running the same code without escape analysis, the results show a different picture:

% java -Xms1G -Xmx1G -XX:+PrintGCDetails -XX:+UseSerialGC -XX:NewRatio=1 -XX:SurvivorRatio=8 -verbose:gc -XX:-DoEscapeAnalysis EscapeTest
Heap
    def new generation   total 471872K, used 327176K
        eden space 419456K,  78% used
        from space 52416K,   0% used
        to   space 52416K,   0% used
    tenured generation   total 524288K, used 0K
        the space 524288K,   0% used

The new generation sits at 78% utilisation (note that no collections were observed due to the extremely large heap size) suggesting that all 3 objects were heap allocated.

Inlining

Interestingly, the memory profile of creating 15million objects with escape analysis enabled showed the exact same memory profile as running the empty class. While we reasoned that t1 and the EqualsBuilder were both stack allocated NoEscape objects, we also reasoned that t2 is a heap allocated ArgEscape object. The memory profile suggests our reasoning is incorrect and that t2 is, actually, stack allocated. Enabling inline compilation printing helps explain these observations:

74    1 %           EscapeTest::main @ 10 (73 bytes)
    ....
                        @ 44   EscapeTest::<init> (15 bytes)   inline (hot)
                          @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                        @ 52   EscapeTest::equals (38 bytes)   inline (hot)
                          @ 9   org.apache.commons.lang3.builder.EqualsBuilder::<init> (10 bytes)   inline (hot)
                            @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                          @ 20   org.apache.commons.lang3.builder.EqualsBuilder::append (25 bytes)   inline (hot)
                          @ 31   org.apache.commons.lang3.builder.EqualsBuilder::append (25 bytes)   inline (hot)
                          @ 34   org.apache.commons.lang3.builder.EqualsBuilder::isEquals (5 bytes)   inline (hot)

Notice that due to JVM inlining, t2 does not actually escape the method of its creation (byte code index 44), after all. Specifically, the body of t1.equals(Object) is inlined into the loop. Indeed, escape analysis considers the escapability of an object after method inlining has been performed. This means that t2, like t1, is really categorised as NoEscape, in this case, and hence may be stack allocated. References are noted in section 8 of Escape Analysis For Java.

So What?

Since you're probably already running escape analysis in your production environment, you may ask why I bother with this article. Simply, I urge you, as a dedicated developer, to freely create and use objects that encapsulate otherwise verbose, redundant, or messy code. This will allow you to communicate your intentions to other developers without worrying about garbage collection overhead or other related (perceived or real) performance hits.

Ultimately, I beseech you to use your understanding of escape analysis to prioritize readability and maintainability over optimization whenever reasonable

References