Contents

JVM: The Constant Pool

It’s snowy, windy and dark outside. It is a perfect weather to read a technical spec, isn’t it?

Let’s read about a .classfile’s constant pool today.

Section 4.4 lists and describes all the structures that form the constant pool. As you might remember from JVM: Class File Format - Structure, the constant pool data are placed right after .classfile’s version information:

1
2
3
4
5
6
7
(...)
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
(...)

What’s important:

  • JVM instructions don’t know the real run-time layout (i.e. location in process’ memory) of classes, interfaces or arrays
  • each instruction rely only on information from constant pool

The Format

Each entry in constant pool table is of this format:

1
2
3
4
cp_info {
  u1 tag;
  u1 info[];
}

The one-byte tag determines what constant it is and must be followed by two or more bytes that give information about specific constant.

The additional data in info array depend on the tag byte.

The Constants

Here’s a list of constant kinds, tags, version of class file format and numerical Java SE version which introduced it.

Attrubute “L?” shows if constants of this type are “loadable” - some constants may be pushed on stack at runtime for further processing.

Constant Kind tag class ver Java SE L?
CONSTANT_Utf8 1 45.3 1.0.2
CONSTANT_Integer 3 45.3 1.0.2
CONSTANT_Float 4 45.3 1.0.2
CONSTANT_Long 5 45.3 1.0.2
CONSTANT_Double 6 45.3 1.0.2
CONSTANT_Class 7 45.3 1.0.2
CONSTANT_String 8 45.3 1.0.2
CONSTANT_Fieldref 9 45.3 1.0.2
CONSTANT_Methodref 10 45.3 1.0.2
CONSTANT_InterfaceMethodref 11 45.3 1.0.2
CONSTANT_NameAndType 12 45.3 1.0.2
CONSTANT_MethodHandle 15 51.0 7
CONSTANT_MethodType 16 51.0 7
CONSTANT_Dynamic 17 55.0 11
CONSTANT_InvokeDynamic 18 51.0 7
CONSTANT_Module 19 53.0 9
CONSTANT_Package 20 53.0 9

The Structures

The JVM Specification describes structures containing constant data in a form of pseudocode structs.

  • The semantics of same-named items of different structs are similar in each structure (e.g. class_index means: the index into constant pool table)
  • but might differ in the data that such item “accepts” as valid (i.e. class_index item in CONSTANT_Methodref_info must be a valid index info a constant_pool entry and that entry must be a CONSTANT_Class_info structure).

For the detailed description of each structure please see the JVM spec, specifically The Constant Pool

Example

The knowledge of the constant_pool layout and the tag values let me identify the constants, at least in simple class files. Let’s have a look at a .class file compiled from Constant.java:

1
2
3
class Constant {
  private static int SIMPLE = 123;
}

Here’s the .class contents in hexedit and constants reported by javap: /en/posts/jvm-spec/chapter_4_constant_pool/hexedit_and_constants.png

So the size of the pool is 0x12 (18), which means there are 17 actual entries (there is one less entries than the size of the pool).

The first constantpool entry has a tag 0x0A (10) which is CONSTANT_Methodref and this tag is followed by two bytes:

  • class_index of values: 0x0002
  • name_and_type_index of value 0x0003 So, to know what class this entry refers to, we need to parse the second and third entry.

The second constant pool entry has a tag 0x07 (7) which is CONSTANT_Class - as expected - and next two bytes (0x0004) are index to constant pool where the CONSTANT_Utf8_info sits.

The third has tag 0x0C (12) denoting CONSTANT_NameAndType and following bytes show name_index (0x0005) and descriptor_index (0x0006).

The fourth has tag 0x01 (utf string), length is 0x0010 (16) and next 16 bytes are the “meat” of the string java/lang/Object:

/en/posts/jvm-spec/chapter_4_constant_pool/hex_string.png

The fifth is an utf8 string (tag: 0x01) of length 6 and value <init>.

The sixth is an utf8 string (tag: 0x01) of length 3 and value ()V

So, what we decoded so far is the reference to java.lang.Object constructor.

The seventh (tag 0x09) is a CONSTANT_Fieldref so what follows is class_index (0x08) and name_and_type_index (0x09).

The eighth (tag: 0x07) is CONSTANT_Class which points to 10th element of the constant pool.

… and so on. Continuing step by step woudld allow to find out what the constants are, with lots of indirection, and finally get the whole list of 17 constants (nicely layed out by javap -v with idices)

My Findings

My notes - sometimes just quotes from the spec:

Int and Float

Those are CONSTANT_Float_info and CONSTANT_Integer_info structures of same format

1
2
3
4
CONSTANT_XXXX_info {
  u1 tag;
  u4 bytes;
}

The values are stored in big endian (high byte first). In order to calculate the actual value, those bytes are first interpreted as int value.

In case of float I found the value of infinity :)

  • the positive infinity is 0x7f800000
  • the negative infinity is 0xff800000

And the Nan is

  • every value in range (0x7f800001, 0x7fffffff) or (0xff800001, 0xffffffff)

The value calculation is

1
2
3
4
5
int s = ((bits >> 31) == 0) ? 1 : -1;
int e = ((bits >> 23) & 0xff);
int m = (e == 0) ?
  (bits & 0x7fffff) << 1 :
  (bits & 0x7fffff) | 0x800000;

And the resulting float is the value of $$s · m · 2^{e-150}$$

Long and Double

These are 8 bytes numeric values and take up two entries in constant pool table (this design decision - as admitted in the spec - was a poor choice) which means that values of the constant pool index that would “point into” the long/double entry are not allowed.

Here the calculation of value is quite similar to the calculation of float value. The bits (converted to long value first as ((long) high_bytes << 32) + low_bytes) are checked if are (+ or -) infinity, or NaN, and then calucated:

1
2
3
4
5
int s = ((bits >> 63) == 0) ? 1 : -1;
int e = (int)((bits >> 52) & 0x7ffL);
long m = (e == 0) ?
  (bits & 0xfffffffffffffL) << 1 :
  (bits & 0xfffffffffffffL) | 0x10000000000000L;

as $$s · m · 2^{e-1075}$$

Fields, Methods and Interface Methods

Methods and inteface methods are represented with

  • CONSTANT_Fieldref_info (tag: 9)
  • CONSTANT_Methodref_info (tag: 10)
  • CONSTANT_InterfaceMethodref_info (tag: 11) which have similar structure:
1
2
3
4
5
CONSTANT_XXX_info {
  u1 tag;
  u2 class_index;
  u2 name_and_type_index;
}

Here class_index points to a constant_pool index of the CONSTANT_Class_info and name_and_type_index points to CONSTANT_NameAndType_info.

Restrictions apply:

  • class_index must point to a class, and not interface in case of CONSTANT_Methodref_info,
  • class_index must point to an interface, and not class in case of CONSTANT_InterfaceMethodref_info,
  • name_and_type_index must point to a constant that represents field in case of CONSTANT_Fieldref_info
  • name_and_type_index must point to a constant that represents method in case of CONSTANT_Methodref_info or CONSTANT_InterfaceMethodref_info

Field or method

Those are represented as structures:

1
2
3
4
5
CONSTANT_NameAndType_info {
  u1 tag;
  u2 name_index;
  u2 descriptor_index;
}

with unqualified name (name_index) and descriptor (descriptor_index) (see my previous Descriptors post)- as indices into the constant pool.

What’s interesting, no information is available about what class they come from.

Method Type

This is represented as CONSTANT_MethodType_info

1
2
3
4
CONSTANT_MethodType_info {
  u1 tag;
  u2 descriptor_index;
}

This constant represents a type of the method and “points” into a pool where the appropriate descriptor is placed.

String constant

In the structure

1
2
3
4
5
CONSTANT_Utf8_info {
  u1 tag;
  u2 length;
  u1 bytes[length];
}

the length is given in bytes and the bytes represent a string in modified UTF-8 strings which are non null-terminated

  • the null character 0 is endoded using 2-byte format - strings never have embedded nulls
  • non-null ASCII characters are represented using only 1 byte per codepoint
  • inly 1, 2, 3-byte formats of standard UTF-8 is used; JVM does not recognize 4-byte format
  • JVM uses its own two-times-three format (e.g. for encoding supplementary charactes above U+FFFF) - each of the two surrogate coede units are represented by three bytes.

Method Handle

1
2
3
4
5
CONSTANT_MethodHandle_info {
  u1 tag;
  u1 reference_kind;
  u2 reference_index;
}

This is CONSTANT_MethodHandle_info struct which is a bit complex:

  • next to the tag (of value 15) there is
  • an item reference_kind of value in a range 1 to 9
  • this value tells what should be in the constant pool under reference_index
  • represents the kind of bytecode behavior
values behavior
1, 2, 3, 4 CONSTANT_Fieldref_info for which setter/setter is to be created
5 or 8 CONSTANT_Methodref_info or constructor for which handle is to be created
6, 7 CONSTANT_Methodref_info or CONSTANT_InterfaceMethodref_info
9 CONSTANT_InterfaceMethodref_info

So this constant represents a kind of a method pointer (hence handle).

Dynamic

Most structures in constant_pool represent “entities”: methods, fields or constants - directly. But there is a way to dynamically generate such representation. This is accomplished by two constants:

  • CONSTANT_Dynamic_info - represents dynamically computed constant by bootsrap method called during processing of ldc instruction
  • CONSTANT_InvokeDynamic_info - represents a call site (java.lang.invoke.CallSite) produced by invocation of a bootstrap method during processing invokedynamic instruction

These constants represent an indirect way of getting direct information about “entities”. Interesting. When is this needed? How is this used? How inviokedynamic works? A good topic for a blog post, I guess.

Package and Module

Two last constants - existing only in class files representing - or generated from - a package and a module descriptor: CONSTANT_Package_info and CONSTANT_Module_info

Summary

This was a short and easy read through types of constants in a constant_pool.

Hopefully I won’t need to bit-fiddle in order to read float or put together bit patterns to recognize UTF-8 strings (I would probably use DataInputStream or ASM library for reading .class file) in the rest of my career, but who knows?

Next sections of .class file spec will cover fields and methods. And then attributes. Then three sections left in order to complete chapter 4:

  • format checking
  • constraints
  • verification

And I’ll be ready to start most mysterious part of the spec: Loading, linking and initializing

See you next time!

Interesting resources

Some links I found recently:


Ten wpis jest częścią serii jvm.

Wszystkie wpisy w tej serii: