[RFC] Expression compiler #7

ryzhyk · 2021-09-20T22:13:13Z

Intro

This RFC focuses on efficient compilation of the DDlog-2 expression language
(i.e., the non-relational subset of the language). Our language design choices,
as well as the choice of the backends we want to support, affect the complexity
(and feasibility) of building an efficient compiler.

Here is the expression language design we've discussed so far:

Syntactically, our expression language is a subset of Rust.
Semantically, it is a pure functional language with pass-by-value semantics
(unless we allow mut variables), i.e., copying a variable to another variable
or field, or passing it as an argument to a function is semantically equivalent
to creating a copy of its value.
Our primary target is Rust, hence we cannot rely on a garbage collector for
automatic memory management (all we have is (1) Rust's ownership model and (2)
limited forms of GC that can be implemented as libraries, e.g., reference
counters).

To the best of my knowledge, the combination of 2 and 3 is unique: functional
languages typically use full-fledged mark-and-sweep GC. The only language I
know that relies on reference counting for GC is Swift, and Swift is
pass-by-reference, not pass-by-value (to be more precise, Swift has struct's
that are not reference counted and are pass-by-value, and classes, which are
reference counted and are pass-by-reference). "Unique" is not a good thing in
this case, I'd rather rely on a well-understood technology, but our requirements
seem to dictate these choices.

So we need compiler + runtime design that implements pass-by-value semantics,
supports automatic memory management, and avoids actual cloning as much as
possible. We discuss a few strawman designs below. A practical solution may
combine more than one of these designs.

Strawman designs

Design 1: wrap all objects in a refcount

The compiler wraps all data types, except primitive types, in a reference
counted pointer. If we want to support mut variables, we probably want to use
a reference counter with copy-on-write support, so that copying is O(1) even
if the value may get mutated later. In addition, the compiler can perform data
flow analysis to avoid incrementing/decrementing the counter when possible.

Pros:

Minimizes cloning.

Cons:

Wastes memory and CPU to store and update a reference counter with each
object.
Even small objects must be heap-allocated. This alone can kill the
performance.

Design 2: Let the user decide

With this approach the user annotates each type to tell the compiler whether
they want it wrapped in a reference counter or not. In Swift, for example, the
class keyword generates a reference-counted type, while structs and tuples are
not reference counted.

Pros:

The user can make optimial decisions based on domain-specific knowledge that
is not available to the compiler.

Cons:

In practice, most users won't know how to decide.

Design 3: Let the compiler decide

The compiler can decide which types to wrap in a reference counter, e.g., based
on their size.

Pros:

Improved performance compared to Design 1.

Cons:

The compiler doesn't have enough information to make an optimal decision.
The API between native Rust and DDlog code becomes unstable, as the compiler
can arbitrarily wrap types in an Arc.

Design 4: Keep reference counters inside libraries

A copy-on-write reference counter is really just one example of a functional
data structure. Functional languages typically come with libraries that
implement many others, e.g., functional maps, sets, heaps, lists.

In this design the compiler does not inject (or even aware of) reference
counters. User-defined types translate directly into equivalent Rust structs,
enums, or tuples. The compiler uses a combination of pass-by-reference
(borrowing), ownership transfer, and cloning to maintain the pass-by-value
semantics. Library data types, especially container types, internally use
functional data structure to reduce the amortized cost of cloning and
modification operations.

Pros:

Simpler compiler design.
Generated Rust type are near-identical to DDlog declarations, so no surprises
for people implementing extern functions.

Cons:

User-defined types, e.g., structs, are not reference counted and will end up
getting cloned when the compiler needs to copy the value (i.e., it cannot
simulate pass-by-value by borrowing or ownership transfer). This can be
expensive, especially for structs with many fields.

Compiler design

All alternatives outlined above require an expression compiler that can maintain
pass-by-value semantics while minimizing actual cloning. In a nutshell,
an expression of the form let x = y, can be compiled to Rust in several ways:

// Option 1: Cloning: always correct, but expensive.
let x = y.clone();

// Option 2: Ownership transfer: variable `y` is no longer used.
let x = y;

// Option 3: Borrowing: `y` outlives `x`; both variables are immutable during
// the lifetime of `x`.
let x = &y;

Option 1 is always valid, but can be expensive, whereas options 2 and 3 require
dataflow analysis to establish the lifetimes and access patterns of x and y.
In general, more complex program rewriting may be required to generate optimal
code, e.g., depending on the context, the following code

let x = y;
let z = y;

may get compiled to:

let x = y;
let z = &x;

In general, even ignoring the algorithmic complexity of optimization, it may
be impossible to statically determine the best implementation.

Optimizations cannot always be applied across function boundaries. For example,
passing input arguments by reference and returning results by value is a
reasonable default calling convention. However, there are cases when
passing by value or returning by reference makes more sense:

// Example: `f1` needs an owned copy of `x`, so `x` should be passed by value
// to avoid cloning.
fn f1(x: LargeType) -> SomeStruct {
    SomeStruct {
        x: x
    }
}

// Example: `f2` returns a field of its argument `s`, and can return
// by-reference to avoid cloning.
fn f2(s: SomeStruct) -> LargeType {
    s.x
}

In this example, the compiler has sufficient information to derive these
optimizations, but if functions f1 and f2 are extern or are declared
without an implementation as part of a trait, then the compiler must use
default calling conventions or rely on user annotations.

The text was updated successfully, but these errors were encountered:

mihaibudiu · 2021-09-20T22:40:27Z

Our task is a bit simpler than other languages since we don't have "new".
If we give up on multithreading does this simplify the problem? Is Rc substantially more efficient than Arc?

ryzhyk · 2021-09-20T22:49:47Z

Our task is a bit simpler than other languages since we don't have "new".

Right, all heap allocation happens inside libraries. This is the same in Rust. Not sure what the implications are.

If we give up on multithreading does this simplify the problem? Is Rc substantially more efficient than Arc?

This is a bit tricky. We are single-threaded in the sense that all "our" code runs in the context of a worker thread and doesn't share state with other workers. But if we use a functional map that internally implements lazy copying, it must be implemented using Arc's, as logical clones of this map (which share state internally) can be accessed to multiple workers.

mihaibudiu · 2021-09-20T22:51:42Z

But what if we give up altogether using multiple workers (when not targeting DD)? Assuming all work in DDlog is done on the calling thread.

ryzhyk · 2021-09-20T22:52:49Z

But what if we give up altogether using multiple workers (when not targeting DD)? Assuming all work in DDlog is done on the calling thread.

Yeah, I'm not sure. Theoretically, Arc is more expensive as it uses atomic instructions.

silvanshade · 2021-09-23T21:00:10Z

It might be worth looking at something like symmetric interaction calculus, or some of the later variants the author has developed, as a potential model. It has the advantage of not needing GC, has efficient parallel reduction semantics, etc., but there are some interesting design constraints. See the points mentioned in the README.

There's also several blog posts from the author on developing that further (later variants of that are used in a fintech computation engine of some sort, but I don't remember the details).

ryzhyk added the rfc Request for Comments label Sep 20, 2021

ryzhyk mentioned this issue Sep 20, 2021

[RFC] Issues and limitations in DDlog-1. #1

Open

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Expression compiler #7

[RFC] Expression compiler #7

ryzhyk commented Sep 20, 2021

mihaibudiu commented Sep 20, 2021

ryzhyk commented Sep 20, 2021 •

edited

Loading

mihaibudiu commented Sep 20, 2021

ryzhyk commented Sep 20, 2021

silvanshade commented Sep 23, 2021

[RFC] Expression compiler #7

[RFC] Expression compiler #7

Comments

ryzhyk commented Sep 20, 2021

Intro

Strawman designs

Design 1: wrap all objects in a refcount

Design 2: Let the user decide

Design 3: Let the compiler decide

Design 4: Keep reference counters inside libraries

Compiler design

mihaibudiu commented Sep 20, 2021

ryzhyk commented Sep 20, 2021 • edited Loading

mihaibudiu commented Sep 20, 2021

ryzhyk commented Sep 20, 2021

silvanshade commented Sep 23, 2021

ryzhyk commented Sep 20, 2021 •

edited

Loading