Contents

Java bytecode manipulation

Motivation

Depending on your motivation your approach to JVM bytecode learning would be different. For example:

  • if you build your own JVM, you probably need a lot more than a very deep understanding of the JVM Specification
  • similarily, if you write bytecode-generating library, you must know intricate details of classfile structure or know the ins and outs of class loading
  • if you are a Java programmer who wants to understand the output of javap then the knowledge of stack-based computing and of what specific bytecodes do would be a good start
  • if for some reason you happen to work with bytecode-generating library (for example, you want to write your own java agent) then you also need an overall knowledge of what is possible
You need coffee
Before you start reading the spec, beware: formal language, repetition and lack of examples may put you to sleep.

Or you may just want to read the spec because you are curious, you treat it as intellectual challenge or you just think that a programmer should know one of the most powerful code-running platforms in the world.

Compatibility

The good part is that, ulimately, the JVM bytecode does not change very often and if it does, the change is (almost) always non-breaking, backward-compatible (i.e. old tools still work with old bytecode, new tools are able to support all previous JVM bytecode versions) - See this SO question: “Is Java bytecode always forward compatible”:

  • JVM bytecode is not forward compatible, i.e. future versions of the bytecode would introduce new opcodes (or new semantics to otherwise ignored values as with ACC_SUPER)
  • JVM is backward compatible, i.e. JVM of version Y can read and interpret bytecode with version X (generated by a compiler in JDK X) where Y >= X
  • there were cases of bytecode “removal” so it may be considered as a violation of JVM’s backward compatibiliyt, however it is the versioning of class file format that allows tools or libraries authors to to properly generate/interpretn bytecode depending on that version.

    If the class file version number is 51.0 or above, then instances of instructions using the jsr, jsr_w, or ret opcodes must not appear in the code array.

I would say that, if you treat learning JVM as an investment, it has a very low fixed costs of maintainance 😄

Libraries

There are some libraries that were created to allow bytecode manipulation. They are used to:

  • create proxy classes in runtime by frameworks
  • create java agents for
    • code instrumentation
    • profiling
    • aspect-oriented programming
    • mutation testing Each of this usecases is interesting on its own and is worth checking out.

The first one which I found when I was reading JVM Spec and playing with Java bytecode is Asm. The second one - currently the go-to library for bytecode manipulation and java aget implementation - is Byte Buddy (which uses Asm underneath).

Asm

Asm is java bytcode modification and analysis framework. Used by Groovy, Gradle, in a Groovy and Kotlin compiler, in code coverage libs (Jacoco and Cobertura), in Mockito and Easymock testing libraries.

This is an interesting, simple library that has two sets of API:

  • “push” (event-based api):
    • the library, when it encounters a specific entity (like method handle, constant, type etc) while traversing the .class file, would call your code (you implement a visitor for specific entity which you want to “process”)
    • fast
    • requires state management to be done manually by programmer - if you need to keep state during processing
    • calls visiting methods in strict order which library users should follow)
  • “pull” (tree api)
    • the entities (i.e. elements of the .class file format) are all read to memory and are ready for you to be inspected; they form hierarchical structure of nodes of a tree
    • easy to process, manipulate or transform nodes
    • slower

ByteBuddy

The ByteBuddy home page states that:

In order to use Byte Buddy, one does not require an understanding of Java byte code or the class file format. In contrast, Byte Buddy’s API aims for code that is concise and easy to understand for everybody.

The list of projects using ByteBuddy is truly impressive:

  • Mockito: Mocking framework.
  • Hibernate: ORM framework.
  • Bazel: Build system by Google.
  • Jackson: Marshalling und unmarshalling library.
  • Aeron: High performance UDP and IPC.
  • Stagemonitor: Application runtime monitoring.
  • Selenium: Browser test automation.
  • Spock: Testing and mocking framework.
  • Quasar: Actor framework.
  • EqualsVerifier: Testing library for object equality.
  • Instana: APM tool.
  • Apache Beam: Parallel processing framework.
  • Ui4j: Web Automation for Java.
  • Flow: Tool to better understand Java applications.
  • Redisson: In-Memory data grid.
  • Power Mock: Mocking library.
  • Census tracer: Instrumentation for Cenus Tracer.
  • Apache Skywalking: APM tool.
  • Datadog APM: APM tool.
  • Elastic APM: APM tool.
  • AssertJ: Java assertion library.

Some simple usecases are documented on the tutorial page. It is worth noting that - comparing with Asm library - the users of ByteBuddy really don’t need to have a clue about JVM bytecode.

In practice however, some basics of classfile structure or class loading are necessary, for example to be able to read JavaDoc for the library.

Some of cool things you can do with this library are:

  • create and instantiate a new class at runtime
  • match specific methods and delegate an execution of code to other class (with or without running original method)
  • match by parameter types, annotations
  • write mocks (or spies) in testing code
  • write AOP agent that monitors or logs execution of specific methods (e.g. network call endpoints)

Writing real agent means you can shoot yourself in a foot - so be careful.

Summary

Writing java agents in real life is usually left to companies that make real money doing runtime monitoring, live reload servers or any kind of dark magic with the bytecode.

However, knowing and using libraries that can directly manipulate bytecode (Asm) or help in more complex AOP (ByteBuddy) it can be fun,and by fun I mean: educational and entertainig at the same time 😄.

Resources

Looking at the bytecode:

Creating Java agent (basics):

Libraries:

Byte Buddy:

Asm: