You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DDlog 2.0 compiler has to be modular, to enable incremental compiler evolution and experimentation (e.g., adding syntax desugaring passes, extending the language with additional features - e.g., traits, adding new code generators, adding new optimizations).
The experience with the P4 compiler based on a visitor architecture and nano-passes has been very positive. That compiler is written in C++, but in a functional style; it uses the C++ garbage collector and it relies strongly on the const C++ keyword to enforce immutability. The P4 compiler uses a purely functional IR, and tools that automatically generate the visitors for the IR. The IR itself is extensible: even external users can add new classes to the IR (e.g., classes used in some specific back-ends).
The compiler is literally a few hundred passes, some of which are repeated. Each pass consumes an IR and produces a new IR, which is copy-on-write modified from the previous one. There are no side-channels between passes where they can exchange information; if a pass needs to produce additional information for other passes (e.g., type information), that information is conveyed into a separate data structure that is passed alongside with the IR representation.
The visitors are strongly typed and make is easy to write analyses that only modify some parts of the code. For example, strength-reduction only applies to expressions, and you need to write no code that touches statements.
The visitor also allows access to the context of an IR node. IR nodes have no parent pointers, but the visitor traverses recursively all children of a node and provides information to the path from the root that has been taken to reach a node.
Many of these features are present in the Haskell DDlog compiler too, but the visitor API is not specifically spelled out, and the use of side data structures is not explicit.
There are some downsides in the P4 implementation: the IR is a DAG that reuses nodes, and this is a source of some hard to track bugs (sometimes the meaning of an expression is dependent on the context; having the same subtree to represent identical expressions in two different places makes is difficult to distinguish between instances).
In the P4 compiler the type information is not part of the IR, it is a separate data structure. Thus the P4 compiler reruns type inference after any IR tree modification. The benefit is that this is an automatic safety check, that will catch immediately changes to the IR which violate typing. Another benefit is that when you modify the IR you don't also need to also modify the type information (to some degree), so the code is somewhat simpler. The downside is that a lot of time is spent in type checking.
If the compiler will be written in Rust, I am proposing to start by defining the structure of the IR and of the visitors that will be used and by spelling out these rules explicitly. The C++ P4 compiler uses dynamic double dispatch using a standard C++ pattern to dispatch methods based both on dynamic IR node type and on visitor type, using the single-dispatch mechanism of C++ virtual functions. We have to figure out how this should be done using traits. In the P4 compiler there are several visitor base classes: Inspector (IR is read-only, produces side data structures), Transform (modifies the IR), and users subclass one of these. There are also predefined visitor combinators to embed a sequence of visitors, to repeat a sequence of visitors, to choose some visitors conditionally, etc.
Another very powerful feature of the P4 compiler is most of the IR is convertible back to legal P4 that can be compiled and validated. This makes debugging by humans much easier, and helps with compiler validation as well. Only the last few back-end passes which generate code in a different target generate a new kind of IR.
The text was updated successfully, but these errors were encountered:
the parser-produced IR. If we use a parser generator tool like ANTLR this produces it's own AST that tracks closely the grammar, and the visitors to traverse it. This is very ephemeral, and has to be converted immediately to the desired IR. A tool like bison on the other hand gives actions that can build the resulting IR directly.
the Datalog-layer IR. This will represent the programs as a set of rules and functions, where each rule computes a relation. Very similar to the Haskell IR.
the high-level circuits IR. This represents the Datalog IR as a set of circuits (recursive function compositions) that operate on streams. The semantics of this is based on the model in the paper draft. This is also convertible back to the LINQ-like textual notation. This representation operates on streams of streams.
the low-level circuits IR. This representation is closer to DD - it models the program as a set of operators on streams where each element is optionally tagged with a time. (The translation between the circuits and low-level circuits is not part of the model yet.) There are no longer streams of streams.
the target-specific IR. We should support multiple code generators, including for DD, but possibly for other targets (raw Rust, C++ to try faster compilation speeds).
ryzhyk
changed the title
Modular compiler design architecture
[RFC] Modular compiler design architecture
Sep 20, 2021
ryzhyk
changed the title
[RFC] Modular compiler design architecture
[RFC] Modular compiler architecture
Sep 20, 2021
For now this is a stub for an RFC.
The DDlog 2.0 compiler has to be modular, to enable incremental compiler evolution and experimentation (e.g., adding syntax desugaring passes, extending the language with additional features - e.g., traits, adding new code generators, adding new optimizations).
The experience with the P4 compiler based on a visitor architecture and nano-passes has been very positive. That compiler is written in C++, but in a functional style; it uses the C++ garbage collector and it relies strongly on the
const
C++ keyword to enforce immutability. The P4 compiler uses a purely functional IR, and tools that automatically generate the visitors for the IR. The IR itself is extensible: even external users can add new classes to the IR (e.g., classes used in some specific back-ends).The compiler is literally a few hundred passes, some of which are repeated. Each pass consumes an IR and produces a new IR, which is copy-on-write modified from the previous one. There are no side-channels between passes where they can exchange information; if a pass needs to produce additional information for other passes (e.g., type information), that information is conveyed into a separate data structure that is passed alongside with the IR representation.
The visitors are strongly typed and make is easy to write analyses that only modify some parts of the code. For example, strength-reduction only applies to expressions, and you need to write no code that touches statements.
The visitor also allows access to the context of an IR node. IR nodes have no parent pointers, but the visitor traverses recursively all children of a node and provides information to the path from the root that has been taken to reach a node.
Many of these features are present in the Haskell DDlog compiler too, but the visitor API is not specifically spelled out, and the use of side data structures is not explicit.
There are some downsides in the P4 implementation: the IR is a DAG that reuses nodes, and this is a source of some hard to track bugs (sometimes the meaning of an expression is dependent on the context; having the same subtree to represent identical expressions in two different places makes is difficult to distinguish between instances).
In the P4 compiler the type information is not part of the IR, it is a separate data structure. Thus the P4 compiler reruns type inference after any IR tree modification. The benefit is that this is an automatic safety check, that will catch immediately changes to the IR which violate typing. Another benefit is that when you modify the IR you don't also need to also modify the type information (to some degree), so the code is somewhat simpler. The downside is that a lot of time is spent in type checking.
If the compiler will be written in Rust, I am proposing to start by defining the structure of the IR and of the visitors that will be used and by spelling out these rules explicitly. The C++ P4 compiler uses dynamic double dispatch using a standard C++ pattern to dispatch methods based both on dynamic IR node type and on visitor type, using the single-dispatch mechanism of C++ virtual functions. We have to figure out how this should be done using traits. In the P4 compiler there are several visitor base classes: Inspector (IR is read-only, produces side data structures), Transform (modifies the IR), and users subclass one of these. There are also predefined visitor combinators to embed a sequence of visitors, to repeat a sequence of visitors, to choose some visitors conditionally, etc.
Another very powerful feature of the P4 compiler is most of the IR is convertible back to legal P4 that can be compiled and validated. This makes debugging by humans much easier, and helps with compiler validation as well. Only the last few back-end passes which generate code in a different target generate a new kind of IR.
The text was updated successfully, but these errors were encountered: