Skip to content

Implementation details

Rohan Padhye edited this page Dec 12, 2017 · 6 revisions

Instrumentation

JQF uses the ASM toolkit to instrument Java bytecode on-the-fly as classes are loaded by the JVM. JQF uses a heavily modified version of the janala2 classes, originally developed by Koushik Sen for performing concolic execution of Java programs.

Janala instruments all applications classes and several JDK classes (classes with names java.lang.* and sun.* are not instrumented). The instrumentation injects a static method call after every bytecode instruction to a class that essentially snoops on everything that is being executed. There is one static method call per bytecode instruction type; the arguments to this call are a uniquely generated instruction identifier (the iid) as well as instruction operands.

The snooping class can therefore recreate the entire execution state of any JVM thread by maintaining a shadow operand stack and call stack.

Trace Events

In JQF, we ignore snoops on most arithmetic and logical data manipulating instructions. We are mainly interested in control-flow instructions such as method calls and branches. Unfortunately, Java has many different ways of invoking method calls (e.g. INVOKESTATIC, INVOKEINTERFACE, INVOKEVIRTUAL, INVOKESPECIAL) as well as branching (e.g. IF_CMPNE, GOTO, TABLESWITCH, LOOKUPSWITCH) plus many other control-flow instructions (e.g. IRETURN, ARETURN, ATHROW). Additionally, there is a lot of noise from instrumentation across multiple JVM threads plus the fact that the JVM hijacks application threads to invoke class loaders when classes are referenced for the first time. Therefore, JQF collects these bytecode instructions and forms a coherent sequence of trace events that are emitted to a Guidance instance.

Trace events provide a higher-level abstraction than bytecode instructions. For example, conditional jumps, gotos and switch statements are analyzed and a BRANCH event is emitted. Similarly, the various method call and return instructions are coalesced into CALL and RETURN events.

By matching call and return events, JQF builds a shadow call stack per-thread. This is useful for many reasons such as ignoring events generated before and after a test method is invoked in the fuzzing loop (so as not to slow-down the test input generation) as well as ignoring events generated by JVM class-loading.

High-performance queues

For maximizing performance, JQF maintains a shadow thread for each application thread, called a tracer. A tracer processes the bytecode instructions executed by its application thread and emits the appropriate trace events to the Guidance instance in a coherent sequence.

The static methods invoked by the injected code in application threads put these instructions on thread-local queues, which are shared only with the thread's own tracer. These queues are implemented as a lock-free single-producer single-consumer circular buffers with busy waiting -- this is only guaranteed to be deadlock-free if the producer and consumer thread are fixed and distinct, which is true for JQF because the producer is always the application thread and the consumer is always its shadow thread.