Downcasting Longs To Ints On x86

Last week, my esteemed colleague and close friend asked a remarkably straight-forward question about downcasting a long to an int in Java. I'll admit the question caught me off guard. While the JLS offered the correct answer, I couldn't help but ponder what's actually happening in the machine.

In this article I'm going to try to explain what actually happens (on x86) when you downcast (or narrow) a 64-bit long to a 32-bit int. I will work my way down from Java bytecode, through the JVM (focusing on HotSpot), down to the CPU. The answer is pretty simple (hint: not much), but getting to the answer is certainly an interesting lesson. As always, should this article contain any mistakes or misinformation, I would appreciate a heads-up.

Bytecode

The Java bytecode responsible for downcasting a long to an int is l2i (long to int). The bytecode expects a long to be on the top of the operand stack (in JVM lingo, this precondition known as ltos, or the top of the stack is a long) and will finish with an int on the top of the operand stack (itos). As a demonstration, the following code will generate a fairly straight-forward example:

public static void main(final String[] args)
{
    final long g = Long.parseLong(args[0]);
    final int i = (int)g;
    System.out.println(i);
}

This will produce the following bytecode:

public static void main(java.lang.String[]);
    Code:
        stack=2, locals=4, args_size=1
        0: aload_0
        1: iconst_0
        2: aaload
        3: invokestatic  #2                  // Method java/lang/Long.parseLong:(Ljava/lang/String;)J
        6: lstore_1
        7: lload_1
        8: l2i
        9: istore_3
        10: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
        13: iload_3
        14: invokevirtual #4                  // Method java/io/PrintStream.println:(I)V
        17: return

Notice at bytecode index (BCI) 8 the aforementioned l2i operator. As expected, BCI 7 is loading a long onto the top of the stack with the lload and, as evident by the istore on BCI 9, is replacing it with an int. At this point, we can step back and let the JVM do its thing.

Worth mentioning is that the Java compiler is smart enough to optimize away unnecessary casts. For example, if the source long was in the constant pool, the compiler can simply calculate the resultant int and store it in the constant pool, too. This can be demonstrated with the following example:

public static void main(final String[] args)
{
    final long l = 50000000000L;
    final int i = (int)l;
    System.out.println(i);
}

Which produces the much more compact bytecode:

public static void main(java.lang.String[]);
    Code:
        stack=2, locals=4, args_size=1
        0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
        3: ldc           #3                  // int -1539607552
        5: invokevirtual #4                  // Method java/io/PrintStream.println:(I)V
        8: return

JVM

On x86, the JVM doesn't actually have to do much (which makes this entire topic interesting). Unfortunately, there's a lot of complexity behind the little that has to be done. Because this topic is extremely complex and is an area of which I am still somewhat unfamiliar (and the focus of another article), I will try to keep the explanation at a high level.

In a nutshell, the JVM is an interpreter. There are different flavours of interpreters, but the one most commonly used (i.e. the ubiquitous "interpreter") is the Template Interpreter. Each time a JVM starts-up, this interpreter is generated and compiled (yes, at runtime). Essentially, the interpreter consists of several "templates" for each and every Java bytecode. The template is created by emitting and then compiling ASM. Accordingly, there is a template generated for the l2i bytecode.

When a template is invoked by the interpreter, the TOS element may be passed in different ways (in fact, each template may have different types of entry points allowing for a type of dynamic invocation). In some instances, it's passed in a register and other times it's passed on the native stack.

CPU

It's hard to generalize what the l2i template looks like since each processor (and operating system) will have a slightly different version (remember, the template is generated at runtime). Because we're focusing on x86, there are two differing areas of focus: 32-bit and 64-bit.

x86_32

For the l2i template, the ltos is passed on the native stack. To continue, the ltos must be popped off the stack. Because x86's general purpose registers are 32-bits wide, the l2i template must use two 32-bit registers to hold the ltos. In this case, EAX and EDX: EAX holds the low-order bits and EDX holds the high order bits.

This template stores the resultant itos in the EAX register. To perform the actual cast, the template must only pop the ltos from the stack into two registers:

0xf36d4160: pop    %eax
0xf36d4161: pop    %edx

Now, EAX holds the low 32 bits, which represent the downcast int. That's it.

x86_64

The x86_64 template is similarly straight-forward, but, because the general-purpose registers are already 64-bits wide, the ltos can be passed in a register (RAX), thus preventing a trip to the stack. Similar to x86_32, the resultant itos is stored in a register, RAX.

0x00007f9f48f01fe8: mov    %eax,%eax

If you're unfamiliar with assembly, the instruction is moving the value in EAX back into EAX. At this point, you may have two questions:

In x86_64, the E** registers are aliases to the low 32 bits of the corresponding R** register. For example, EAX will provide the low-order 32 bits of the value in RAX. Similarly, when storing to E**, only the low-order 32 bits are written (the high-order bits are "zeroed"). Given this information, reading the low-order bits of RAX (through EAX) and storing them into RAX (through EAX) effectively performs the downcast. RAX now contains the itos.

Considerations

This article probably isn't too useful unless you find yourself on the same journey as me: spelunking through the OpenJDK HotSpot JVM trying to become a better developer. In which case, I hope this article is useful and informative.