Contents

JVM: Loading, Linking, Initializing

This chapter of JVM Specification describes three phases of life of a JVM class: loading, linking and initializing.

Good read
A good read from the perspective of a Java programmer is chapter 12 of Java Language Specification which describes the execution of Java program starting from a main method of the starting class (also called: main class).

In order to start executing main method (public void main(String[] args)) of the main (or: initial) class the JVM needs to have a runtime representation of the class. If JVM does not have it, it needs to construct it from binary representation. This is called loading.

What is main class

What is “main class”? It is implementation dependent:

  • it may be a fully qualified name given on the commandline or
  • it may be a class which name is written inside a manifest file inside an executable .jar file
  • it may be provided in any other way (depending on what Java runtime is responsible for the execution of the main class)
  • or may be generated by JVM on the fly (JShell)

Loading

Loading is a process of finding the binary form of the class/interface for given name and creating a Class object that represents that class/interface.

The loading is done by an instance of the ClassLoader and its subclasses. Its method define_class can be used to construct an instance of a Class object; such class representation is stored in method area of the JVM (see JVM: Structure of the JVM post).

Types of class loades

There are two types of class loaders:

  • bootstrap class loader (provided by the JVM) and
  • user-defined class loader which must be a subclass of a ClassLoader class

User-defined class loaders are created to extend the capabilities of the bootstrap class loader: they may know how to load classes from different sources (network, database, encrypted files etc.)

How is loading performed?

The loading of a class C with name N:

  • is initiated by other class D which has references to class C in its runtime constant pool (per-class runtime representation of a constant-pool).
  • includes asking some class loader L to locate a binary representation of C and then deriving (parsing the bytes) and creatting C in method area

If JVM asks class loader L to load C, L may load it:

  • directly (by locating binary representation and then asking JVM to create C in method area) - here we say that L is a defining class loader
  • indirectly (by delegating to another class loader which then loads C - either directly or indirectly)

If loader L1 initiated loading of class C and - if delegation took place - loader L2 completed the loading, L1 (and L2) are called initiating loaders (other loaders in the chain between L1 and L2 are not considered initiating loaders).

Properties of well-behaved loader

  • If I ask a loader to load me a class with a name, the class loader should provide the same class object each time I ask
  • If I ask L1 to load a class, but L1 delegates to L2, then if I ask L1 or L2 for related classes (types of fields, method parameters or superclass/interface) - they both should give me same class objects
  • Errors during class prefetching or related classes loading should be hidden, as if special loading does not take place
What defines a class?

A class is determined by a pair:

class === class’ binary name name + class’ defining class loader

What is aruntime package?

A runtime package of a class/interface is determined by a pair:

runtime package === package name+ class’ defining class loader

Loading errors

Errors that may appear during loading are subclasses of LinkageError:

  • ClassCircularityError - when a class/interface, if loaded, would be its own superclass/superinterface
  • ClassFormatError - malformed binary data of compiled class
  • UnsupportedClassVersionError - data in .classfile format, but minor or major version not supported
  • IncompatibleClassChangeError
    • if superclass is in fact an interface or final class
    • if direct superclass has PermittedSubclasses attribute and
      • is in different run-time module or
      • is in same module but different runtime package and the loaded class does not have ACC_PUBLIC flag set
    • PermittedSubclasses attribute of superclass does not contain name of the loaded class
  • NoClassDefFoundError - class loader couldd not find requested class

Unrrelated to loading but possible:

  • OutOfMemoryError - due to errors in memory allocation for new data structures

Modules and layers

Modules

  • modules are used to control the access to classes inside that module from classes outside the module.
  • a program declares names of packages in a module, and class loaders that will load classess in each of such packages
  • package names and class loaders are parameters of defineModules method of MethodLayer class used to create runtime module
  • a class is associated with a runtime module if its runtime package is associated with that module
  • a class created by a class loader is in exectly one runtime package, and therefore in exactly one runtime module

Layers

A layer is a set of class loaders that together serve to create classes in a set of runtime modules.

  • boot layer - created by JVM on startup, load classes from standard packages like java.lang in java.base module using boot class loader
  • user-defined layers - created by programs, consist of user-defined class loaders to load packages from modules that depend on java.base

A runtime module is part of exactly one runtime layer.

Linking

Linking of a class is a multiphase process of verifying and preparing:

  • this class
  • its superclass and its superinterfaces
  • its element type (if this class is array class)

and it also includes resolution of symbolic references.

The rules of linking

  • a class must be completely loaded before it can be linked
  • a class is completely verified and prepared before it can be initialized
  • errors detected during linkage must appear at the place of a program where an action of a program might cause such linkage error
  • a symbolic reference to a dynamicaly-computed constant is not resolved until a proper instruction is executed (ldc, ldc_w, ldc2_w) or when a bootstrap method that refers to it is called
  • a symbolic reference to a dynamicaly-computed call site is not resolved until a bootstrap method that refers to it as static argument is called

Strategies

JVMs might have different linking strategies, e.g.

  • lazy - each symbolic reference is resolved individually when it is used
  • eager - all symbolic references are resolved at once when a class is verified

Linkink requires allocation and may throw OutOfMemoryError.

The process

Here are the phases of linking:

  • Verification: ensures that binary representation is correct (see JVM: Loading, Linking, Initializing post).

  • Preparation creates static fields and initializes them with default values (does not require execution of bytecode); calculating static initializers is part of initialization, not preparation

  • Resolution is the process of determining concrete values for symbolic references (usually bytecode instructions refer to constant-pool’s symbolic references)

    Instructions requiring resolution of symbolic references
    newarray, checkcast, getfield, getstatic, instanceof, invokedynamic, invokeinterface, invokespecial, invokestatic, invokevirtual, ldc, ldc_w, ldc2_w, multianewarray, new, putfield, and putstatic

Resolution

Symbolic references in a constant-pool that need resolution are:

  • classes or interfaces
  • fields
  • methods
  • method types
  • method handles
  • dynamically-computed constants

JVM Spec gives very precise rules regarding:

  • what are the conditions of re-evaluating a symbolic reference
  • how does the field/method lookup procedes, when it is recursively applied to superclasses
  • what are the loading constraints (return types and types of parameters should resolve to the same types even if class and superclass are loaded by different class loaders etc.)
  • what are the rules regarding resolution of method handles

It also explain application of access control rules, method overriding conditions, selection of a method in case of invokevirtual or invokeinterface instructions.

Initialization

In this phase JVM executes the code to initialize classes or interfaces - it calls class/interface initialization method.

A class/interface can be initialized after:

  • one ot bytecode instructions is called: new, getstatic, putstatic, or invokestatic
  • first invocation of MethodHandle takes place (which is a result of a resolution of method handle of type 2, 4, 6, or 8)
  • the initialization of a subclass of this class
  • class that implements this interface is initialized and this interface has non-static and non-abstract methods
  • if this class is iniital class which name is given at the JVM startup

Initialization procedure

The algorithm for initialization of a class/interface is described nicely in just 12 steps:

  • it assumes that for each class C there exists a lock CL (and uses this lock to mark that the class is being initialized; and also to mark the class as erronous)
  • the algorithm is recursive (is called for each superclass/superinterface) and JVM needs to know if initialization of class C is just in progress

Binding

The specification mentions also the binding which is a process in which a code written in different programming language (usually C) is somehow integrated into JVM so that it can be executed.

EXIT

JVM exits when:

Chapter 5 status: done.

Conclusions

I’ve just discovered I’m short of coffee ☕. I’ve also discovered Polish Government Weather Alert alert that I should stay at home due to rain, wind and all the bad things. No coffee today, I guess.

Now I’d like to say a big Thank You! to all the people who wrote and implemented the JVM (and Java Language) specification. I’m sure you buy and drink the best coffee in the world (which you rightly deserve) because otherwise I have no idea how you guys function. Even if you will never ever stumble upon my little blog, I send you my love, appeciation and admiration. Good job.


Ten wpis jest częścią serii jvm.

Wszystkie wpisy w tej serii: