-
Notifications
You must be signed in to change notification settings - Fork 223
CAst Call Graph Details
Here, we give some technical details for those interested in the implementation of the call graph builder for CAst-based languages, i.e., the Java source front ends and JavaScript.
CAst allows for languages with nested functions and accesses from nested functions to variables declared in enclosing functions. We call such an access a lexical access, and we say a variable accessed from a nested function is exposed. Here we discuss how lexical accesses are handled during WALA's pointer analysis. Note that while lexical accesses from anonymous classes are translated away in Java bytecodes, such accesses are handled directly when analyzing Java source code using CAst.
Originally, WALA went to great lengths to try to maintain some kind of SSA representation even for exposed variables, to improve precision with some level of flow sensitivity. In this approach, lexical reads from calls to nested functions must access the correct SSA name for an exposed variable, and lexical writes from such calls must be treated as a "def" and create a new SSA name. Unfortunately, this approach becomes rather complex since the analysis must compute call graph information to determine when nested functions are being invoked. Hence, SSA renaming and call graph construction must be performed simultaneously, a tricky proposition. Issues are further complicated in JavaScript, since nested functions may be returned as closures, in which case lexical writes operate on the closure rather than a local variable on the call stack. While the code to implement this technique lives on (e.g., see SSAConversion), we have transitioned to a simpler approach that we believe should have little precision loss in practice.
Our new approach essentially works by heap allocating any exposed variable and treating lexical accesses appropriately. Specifically:
- Lexical accesses are determined by building symbol tables while
generating the CAst from a language-specific AST. The relevant code
is in
AstTranslator
, e.g., thedoLexicallyScopedRead
anddoLexicallyScopedWrite
methods. Note that with this scheme, we expect the implementation ofAstTranslator.useLocalValuesForLexicalVars()
to returnfalse
. - Conceptually, the pointer analysis maintains a copy of an exposed
variable for each
CGNode
representing the declaring function (see theUpwardFunargPointerKey
inner class inAstSSAPropagationCallGraphBuilder
). - Given a lexical access of v declared in function f from
CGNode
n, theAstSSAPropagationCallGraphBuilder.getLexicalDefiners()
method discovers the relevantCGNode
s for f. First, it computes the current points-to set ofv1
for n, which holds the "receiver" for n. (In JavaScript,v1
holds the abstract location(s) for the invoked function, not thethis
argument.) The receiver / function is assumed to be of typeScopeMappingInstanceKey
, anInstanceKey
that maintains information about lexical parents of the method allocating the function. For eachScopeMappingInstanceKey
, we invokegetFunargNodes
to find the appropriateCGNode
s for f. NOTE: for JavaScript,getFunargNodes
relies on the (synthetic) contructor allocating the function to be analyzed with a caller or call-string context, in order to determine which function invoked the constructor. - Once we have the appropriate CGNodes for f, we create
corresponding
UpwardFunargPointerKey
objects for v, and then add the appropriate constraints on the key depending on whether the access is a read or a write.