Skip to content

Latest commit

 

History

History
144 lines (101 loc) · 6.64 KB

program-abstraction.adoc

File metadata and controls

144 lines (101 loc) · 6.64 KB

Program Abstraction in Tai-e (core classes and IR)

This document introduces Tai-e’s abstraction of the Java program being analyzed. You will likely need to use the classes introduced in this document when developing analyses on top of Tai-e. See Section 2 of Tai-e’s paper for more discussions.

Core Classes

  • JClass (in pascal.taie.language.classes) represents classes in the program. Each instance contains various information of a class, such as class name, modifiers, declared methods and fields, etc.

  • JMethod and JField: (in pascal.taie.language.classes): represents class members, i.e., methods and fields in the program. Each JMethod/JField instance contains various information of a method/field, such as declaring class, name, etc.

  • ClassHierarchy (in pascal.taie.language.classes): manages all the classes of the program. It offers APIs to query class hierarchy information, such as method dispatching, subclass checking, etc.

  • Type (in pascal.taie.language.type): represents types in the program. It has several subclasses, e.g., PrimitiveType, ClassTyp, and ArrayType, representing different kinds of Java types.

  • TypeSystem (in pascal.taie.language.type): provides APIs for retrieving specific types and subtype checking.

  • World (in pascal.taie): manages the whole-program information of the program. By using its getters, you can access these information, e.g., ClassHierarchy and TypeSystem. World is essentially a singleton class, and you can obtain the instance by calling World.get().

Tai-e IR

Tai-e IR is typed, 3-address, statement and expression based representation of Java method body.

You could dump IR for the classes of input program to .tir files via option -a ir-dumper. By default, Tai-e dumps IR to its default output directory output/. If you want to dump IR to a specific directory, just use option -a ir-dumper=dump-dir:path/to/dir. ir-dumper is implemented as a class analysis, thus the scope of the classes it dumps are affected by option -scope.

The IR classes reside in package pascal.taie.ir and its sub-packages.

There are three core classes in Tai-e IR:

  • IR is the central data structure of intermediate representation in Tai-e, and each IR instance can be seen as a container of the information for the body of a particular method, such as variables, parameters, statements, etc. You could easily obtain IR instance of a method by JMethod.getIR() (providing the method is not abstract).

  • Stmt represents all statements in the program. This interface has a dozen of subclasses, corresponding to various statements. Stmts are stored in IR, and you could obtain them via IR.getStmts().

  • Exp represents all expressions in the program. This interface has dozens of subclasses, corresponding to various expressions. Exps are associated with Stmts, and you could obtain them via specific APIs of Stmt.

We believe that the API of IR is self-documenting and easy to use. To make IR more intelligible, we present a formal definition (i.e., context-free grammar) below that illustrates all kinds of expressions and statements in the IR, and how Stmt are formed by Exp. Most non-terminals in the grammar corresponds to classes in pascal.taie.ir.

Grammar of Expressions

Exp → Var | Literal | FieldAccess | ArrayAccess | NewExp | InvokeExp | UnaryExp | BinaryExp | InstanceOfExp | CastExp

  • Var → Identifier

  • Literal → IntLiteral | LongLiteral | FloatLiteral | DoubleLiteral | StringLiteral | ClassLiteral | NullLiteral | MethodHandle | MethodType

  • FieldAccess → InstanceFieldAccess | StaticFieldAccess

    • InstanceFieldAccess → Var.FieldRef

    • StaticFieldAccess → FieldRef

    • FieldRef → <ClassType: Type FieldName>

    • FieldName → Identifier

  • ArrayAccess → Var[Var]

  • NewExp → NewInstance | NewArray | NewMultiArray

    • NewInstance → new ClassType

    • NewArray → new Type[Var]

    • NewMultiArray → new Type LengthList EmptyList

    • LengthList → [Var] | [Var]LengthList

    • EmptyList → ε | []EmptyList

  • InvokeExp → InvokeVirtual | InvokeInterface | InvokeSpecial | InvokeStatic | InvokeDynamic

    • InvokeVirtual → invokevirtual Var.MethodRef(ArgList)

    • InvokeInterface → invokeinterface Var.MethodRef(ArgList)

    • InvokeSpecial → invokespecial Var.MethodRef(ArgList)

    • InvokeStatic → invokestatic MethodRef(ArgList)

    • InvokeDynamic → invokedynamic BootstrapMethodRef MethodName MethodType [BootstrapArgList] (ArgList)

    • MethodRef → <ClassType: Type MethodName(TypeList)>

    • MethodName → Identifier

    • TypeList → ε | Type TypeList'

    • TypeList' → ε | , Type TypeList'

    • ArgList → ε | Var ArgList'

    • ArgList' → ε | , Var ArgList'

    • BootstrapMethodRef → MethodRef

    • BootstrapArgList → ε | Literal BootstrapArgList'

    • BootstrapArgList' → ε | , Literal BootstrapArgList'

  • UnaryExp → NegExp | ArrayLengthExp

    • NegExp → !Var

    • ArrayLengthExp → Var.length

  • BinaryExp → ArithmeticExp | BitwiseExp | ComparisonExp | ConditionExp | ShiftExp

    • ArithmeticExp → Var ArithmeticOp Var

    • ArithmeticOp → + | - | * | / | %

    • BitwiseExp → Var BitwiseOp Var

    • BitwiseOp → "|" | & | ^

    • ComparisonExp → Var ComparisonOp Var

    • ComparisonOp → cmp | cmpl | cmpg

    • ConditionExp → Var ConditionOp Var

    • ConditionOp → == | != | < | > | ⇐ | >=

    • ShiftExp → Var ShiftOp Var

    • ShitOp → << | >> | >>>

  • InstanceOfExp → Var instanceof Type

  • CastExp → (Type) Var

Grammar of Statements

Stmt → AssignStmt | JumpStmt | Invoke | Return | Throw | Catch | Monitor | Nop

  • AssignStmt → New | AssignLiteral | Copy | LoadArray | StoreArray | LoadField | StoreField | Unary | Binary | InstanceOf | Cast

    • New → Var = NewExp;

    • AssignLiteral → Var = Literal;

    • Copy → Var = Var;

    • LoadArray → Var = ArrayAccess;

    • StoreArray → ArrayAccess = Var;

    • LoadField → Var = FieldAccess;

    • StoreField → FieldAccess = Var;

    • Unary → Var = UnaryExp;

    • Binary → Var = BinaryExp;

    • InstanceOf → Var = InstanceOfExp;

    • Cast → Var = CastExp;

  • JumpStmt → Goto | If | Switch

    • Goto → goto Label;

    • If → if ConditionExp goto Label;

    • Switch → TableSwitch | LookupSwitch

    • TableSwitch → tableswitch (Var) { CaseList default: goto Label; }

    • LookupSwitch → lookupswitch (Var) { CaseList default: goto Label; }

    • Label → IntLiteral

    • CaseList → ε | case IntLiteral: goto Label; CaseList

  • Invoke → InvokeExp; | Var = InvokeExp;

  • Return → return; | return Var;

  • Throw → throw Var;

  • Catch → catch Var;

  • Monitor → monitorenter Var; | monitorexit Var;

  • Nop → nop;