JVM: Compiling for the JVM
Overview
This chapter gives an overview of how the compilation goes: how the code that resides inside methods is converted to the bytecode. The word compilation means just a transformation from the source code (Java file) to a Java bytecode.
The structure of a .class file (which contains the actual bytes of the bytecode, constant pool and other data) is desribed in much detail in chapter 4 of the JVM specification.
Visualization
In order to better visualize the stack, the frames, the variables array, the operand stack (and in order to overcome the confusion between call stack, stack frames and operand stack) I needed a resource that would talk to my visual brain and would show me how things look like.
One of the “visualization” resources I found was an old article at Artima by BillVeners - with clear diagrams that somehow “talk to me”.
Well, yes, I get it - all those mental imagery is just a helper, a tool, and things don’t look as shown on some pictures. Besides, apart from pedantic following .class file structure, jvm implementors are quite free to choose whatever way to represent data in memory during runtime and to do calculations however they want. Anyway, this is how I understand that.
Let’s dive into the spec
What I have just learned?
Local Variables
- local variables are usually loaded using no-arg opcodes like iload_1, iload_2 etc (variants denoting index of a local variable to be pushed on operand stack)
- put back into local variable table with istore_1, istore_2 (prefix denotes type)
- constants are loaded using either no-arg
iconst_<i>
(variants for<i>
in range (-1, 0, 1, 2, 3, 4 or 5) or single argiconst value
Example:
Bits manipulation
~x == -1^x so in order to calculate negation of bits, a ixor
operation is called
Example:
Constant Pool
Numeric constants, objects, methods and fields are accessed using constant pool; loading values from constant pool is done using
ldc
,ldc_w
- types other than long or double; wide variant (_w
) used when constant pool is very largeldc2_w
- for loading long or double from constant pool
In the below example the pool looks like this:
|
|
Example:
While loop
Simple while loop with int control variable compiles diferently on my machine than shown in the spec. Interesting.
Here’s my javap result and the spec suggestion:
The point made in the spec was that moving the comparison at the end of the compiled code with initial goto 8
that “jumps over” the code inside the loop is very wise: in the usual case when the loop executes more than once, at each iteration of the loop it is the conditional jump that causes control transfer to loop body.
In my version of bytecode (on the picture above: with black background) the loop body needs additiona goto statement that transfers control to the loop body.
I need to ask someone wiser. My guess is that this kind of “inefficiency” does not matter since it will be effectively “jit-ed” (i.e. during runtime just-in-time compilation would generate optimal machine code anyway).
Receiving arguments
The only difference between static and non-static method is that:
- in a static method all passed argments appear from index 0
- in non-statc method 0-th element of local variables table is occupied by a reference to this, so actual method arguments’ indices start from 1.
Example:
Invoking methods
- methods are dispatched based on runtime type of the object using
invokevirtual
instruction - this instruction calls virtual method and passes all required arguments which need to be prepared on the operand stack
- when a stack frame is created for the virtual method, this reference will be 0-th value in local variables table and other parameters will be passed as 1-st, 2-nd etc. argument
Example: Method addThree(int a, int b, int c) is using nonstaticAdd(int a, int b) from the example above:
|
|
It is compiled in following way:
|
|
- 0: this reference is pushed
- 1: 1-st variable (a) is pushed on operand stack (those two values are needed for outer
nonstaticAdd
call; third value is also needed - so it will be calculated first and pushed on stack) - 2: this refrence is pushed again
- 3: 2-nd variable (b) is pushed
- 4: 3-rd variable (c) is pushed
- 5: this
invokevirtual
represents the inner callnonstatic(b, c)
- it eats up values pushed in steps 2, 3 and 4 above (this, b and c) and pushes the result back on stack - 8: this
invokevirtual
represents the outer callnonstatic(a, ...)
- it eats up three remaining values (this, a and result pushed in setp 5) from the stack and pushes the result back on the stack - 11:
ireturn
returns with the result of type int and the control is passed to a calling method
Instruction invokevirtual
needs 3 bytes: opcode and two more bytes needed to caclulate the index into the constant pool (index = indexbyte1 << 8) | indexbyte2
). The instruction does a really lot: see invokevirtual
Instruction invokestatic
is used to call static method (this call does not require this reference on stack before call).
Instruction invokespecial
has two usages:
- instance initialization and
- call to super method … and also is non-trivial; see invokespecial
Objects
Creation of objects:
new
instructiondup
for duplicating reference on top of operand stack- one reference is eaten up by instance initialization, the other is returned to the caller
Example:
- fields are accessed using
getfield
andputfield
; - those instructions take symbolic reference to a value in constant pool (which is resolved at runtime)
Arrays
- have unique set of instructions
newarray
- creates array of numeric element typeanewarray
- creates array of reference element typemultinewarray
- creates multidimensional arrayarraylength
- gives the length
Example:
Compiling switches
Instructions: tableswitch
and lookupswitch
tableswitch
- used when switch cases can be represented as a table with values being offsets to the code; if index is outside of valid table index range, default case is takenlookupswitch
- used when table is sparse; the keys are sorted
Example:
Exceptions
This section of chapter 4 is super funny. Starting from simple example it builds up more complex scenarios with not only trowing but alos try-catching, try-finnaly and most complex try-catch-finally case.
Simple throw
Points:
new
is used to create exception instancedup
for reference duplication, needed for…invokespecial
used to initialize the instanceathrow
to do the actual throw
Example:
Try-catch
When catch is present, the compiled code contains exception table which has entries for each of the sections (or: ranges) where an exception can be thrown. Each entry contains the range “guarder” by the catch clause and an offset to a catch clause code itself.
In case of a throw, when control comes back to target code in exception table, the thrown value is available on the top of the stack.
If there was no throw, there is a goto
jump to return instruction
.
Example:
The case with two or more catch clauses (for more than one exception type) is similar - the only difference is that excpetion table grows to encompass new ranges.
Try-finally
A bit more complex variant with many possible code paths. The spec shows the usage of jsr
and ret
instructions (jump to subroutine and return from subroutine), however my code does not use it - jsr
was deprecated in Java 7.
Have a look:
Example:
- Here, if exception is not thrown, the code in finally (instructions: 5 - 10) is executed right after the code in try (1 - 2).
- However, if an exception is thrown, according to exceprion table the control goes to instruction 16:
- the exception object reference is stored in local variable 1
- then the same code (17 - 25) as before (5 - 10) is generated again, and at the end
- the exception object refrence is loaded on operand stack and exception is re-thrown.
Try-catch-finally
Here’s the most advanced variant of exception handling logic. Let’s see how it works.
Example:
Instructions representing finally block:
|
|
are repeated tree times - in instruction ranges:
- [5, 8, 10] - executed if
cantBeZero(..)
does not throw - [22, 25, 27] - executed if
cantBeZero(..)
throws TextExc, after calling a handler in instrction 19 - [34, 37, 39] - if
cantBeZero(..)
throws other exception, or if handler throws any exception
Synchronization
Two bytecode instructions are used to compile synchronized statements in Java language: monitorenter
and monitorexit
. Synchronized methods are recognized in runtime by method invoking instrunction (by checking ACC_SYNCHRONIZED flag attached to the method).
Code
All code snippets used in this post are availabe on GitHub.
Writing an interpreter 😂
- Worth reading: Crafting Interpreters
- Worth looking at: j-j-jvm in Java, luaj in Lua, toyJVM in C or LLVM-JVM in Haskell
Ten wpis jest częścią serii jvm.
- 2022-14-02 - JVM: Loading, Linking, Initializing
- 2022-11-02 - JVM: Verification and Checks
- 2022-06-02 - JVM: Fields, Methods, Attributes
- 2022-02-02 - JVM: The Constant Pool
- 2022-26-01 - JVM: Names and Descriptors
- 2022-22-01 - JVM: Class File Format - Structure
- 2022-16-01 - JVM: Compiling for the JVM
- 2022-08-01 - JVM: Instruction Set Summary
- 2022-07-01 - JVM: Structure of the JVM
- 2022-04-01 - JVM: Introduction