-
Notifications
You must be signed in to change notification settings - Fork 42
In depth overview
Note: Chinese users may want to refer to @maxiee's blog posts about dart_eval.
dart_eval has two fundamental components: the compiler and the runtime. The runtime can be further broken down into sub-components: the bytecode VM and the runtime bridge. Furthermore, there are several ‘supplemental’ components such as the eval() function, runtime overrides, and the Dart standard library bindings.
Let’s go over each of these in a bit more detail, starting with the compiler.
The compiler’s entrypoint is compiler.dart. This file contains most of the steps needed to prepare for compilation. A user typically calls it using the compile() method, which is simply a shorthand for the compileSources() method that performs the real processing.
Before we actually compile the Dart code, a few things must be set up.
First, the source code must be parsed, from text into an abstract syntax tree (AST). While parsing is a whole topic unto itself, dart_eval simply relegates this task to the Dart analyzer package using its parseString() method. This turns code similar to this:
import 'dart:math';
int main() {
return min(1, 2 + 3);
}
Into a tree like this (simplified for clarity):
CompilationUnit (main.dart)
> ImportDirective ('dart:math')
> FunctionDeclaration ('main', returns: 'int')
> BlockStatement
> ReturnStatement
> MethodInvocation ('min')
> IntegerLiteral (1)
> BinaryExpression (+)
> IntegerLiteral (2)
> IntegerLiteral (3)
Second, we must combine Dart files into logical “libraries”. For most code, each Dart file is a standalone library, but using the part/part of syntax multiple Dart files can be combined into a single library. This is a classic graph algorithm, and like the official Dart compiler, dart_eval uses the Djikstra path-based strong component algorithm to find "strongly connected components", i.e. groups of files where there is both a reference from the root file to its parts (“part x.dart”) and from each part back to the root (“part of y.dart”). Also in this step, bridge libraries that share an identical URI have their declarations merged with parsed libraries.
dart_eval uses tree shaking for dead code elimination, and the first step of this is simply discovering all libraries which are reachable through imports and exports. First, we build an import/export graph from all libraries' imports and exports. Unlike the part/part of graph from earlier, this is a directed graph - essentially a tree, except permissive of cyclic references - linking together libraries based on their import and export directives. We're given a set of entrypoint files such as main.dart
, and crawl through the graph starting at each entrypoint to discover all files which are eventually reachable by the program.
The next step of tree shaking is to find a list of identifiers that each code declaration (like a function or a class) uses. To do this, the TreeShakeVisitor
recursively walks the AST of each declaration and emits strings for each type name and identifier it finds. For example, the following code:
Future<String> getData(int index) async {
final result = await Api().fetchData('/v1/${index}', config);
return result.data;
}
would return ['Future', 'String', 'getData', 'index', 'result', 'Api', 'fetchData', 'config', 'data']
.
Next, we must resolve each library’s imports and exports. To be exact, the goal of this step is to find all of the declarations that should be visible to each library when it is compiled - i.e. if ClassA
is defined in a.dart
and b.dart
imports a.dart
, a function in b.dart
should be able to instantiate ClassA
; however, a function in c.dart
that does not import a.dart
should throw a compile error if you attempt to instantiate ClassA.
This step uses a two-pass algorithm to resolve all declarations visible to each library.
First, build an export graph from all of the libraries’ exports.
Second, for each library we're resolving, iterate through each of its imports and find all declarations exposed by the import, both directly and through the export graph, taking into account show
and hide
combinators. This step also groups prefixed imports into a second layer.
To tree-shake, we start off by scanning through all entrypoint libraries' imports. For each declaration in those imported libraries, check the list of identifiers that was created earlier to see if that declaration was used. If it is, we mark it as used and mark the imported library as needing to be processed.
Next, we run the following loop until their are no more libraries to be processed:
- Get a library from the list to process
- For each declaration in that library that is marked as used, get the list of identifiers that it uses.
- Loop through the library's imports (including the implicit import of itself), and for each declaration in the imported library, check if it is in the list of used identifiers.
- If it is, mark that declaration as used and add the imported library to the list of libraries to be processed.
To avoid infinite loops with cyclic imports, we also maintain a list of every import we've processed, and skip re-processing ones that we've already seen.
Finally, simply filter out all declarations in every library that are not marked as used.
During compilation, the compiler needs to be able to look up declarations quickly based on their names. In this step, several maps are formed to allow this, including a top-level declaration map and instance declarations map.
The compiler now creates and caches types for each class present in the source code and each bridge class. These types are unresolved for now, meaning they don't have information about what other types they extend/implement/mixin.
The compiler now uses the cached type refs and the resolved imports and exports to build a simple String mapping of all types visible to each library. When compiling the code, we can reference this map to figure out what, for example, ClassA
means in the current context. Since earlier types are overwritten by later ones, it will only resolve to one type.
For efficient access, all static functions and global values (top-level variables) are assigned a unique integer index, starting at 0 and incrementing.
Finally, it's time to actually compile some code!
The dart_eval compiler starts by compiling static fields and top-level variables to infer their type, before moving on to compiling the rest of the code. Unsurprisingly, compilation can take many different paths depending on what is being compiled. In general, compilation starts at the declaration level (class, function, field, constructor). Many declarations contain statements, such as blocks, loops and returns, as well as expressions, such as assignments and method calls.
This document is incomplete. It will continue to be updated.