Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Expression compiler #7

Open
ryzhyk opened this issue Sep 20, 2021 · 5 comments
Open

[RFC] Expression compiler #7

ryzhyk opened this issue Sep 20, 2021 · 5 comments
Labels
rfc Request for Comments

Comments

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 20, 2021

Intro

This RFC focuses on efficient compilation of the DDlog-2 expression language
(i.e., the non-relational subset of the language). Our language design choices,
as well as the choice of the backends we want to support, affect the complexity
(and feasibility) of building an efficient compiler.

Here is the expression language design we've discussed so far:

  1. Syntactically, our expression language is a subset of Rust.

  2. Semantically, it is a pure functional language with pass-by-value semantics
    (unless we allow mut variables), i.e., copying a variable to another variable
    or field, or passing it as an argument to a function is semantically equivalent
    to creating a copy of its value.

  3. Our primary target is Rust, hence we cannot rely on a garbage collector for
    automatic memory management (all we have is (1) Rust's ownership model and (2)
    limited forms of GC that can be implemented as libraries, e.g., reference
    counters).

To the best of my knowledge, the combination of 2 and 3 is unique: functional
languages typically use full-fledged mark-and-sweep GC. The only language I
know that relies on reference counting for GC is Swift, and Swift is
pass-by-reference, not pass-by-value (to be more precise, Swift has struct's
that are not reference counted and are pass-by-value, and classes, which are
reference counted and are pass-by-reference). "Unique" is not a good thing in
this case, I'd rather rely on a well-understood technology, but our requirements
seem to dictate these choices.

So we need compiler + runtime design that implements pass-by-value semantics,
supports automatic memory management, and avoids actual cloning as much as
possible. We discuss a few strawman designs below. A practical solution may
combine more than one of these designs.

Strawman designs

Design 1: wrap all objects in a refcount

The compiler wraps all data types, except primitive types, in a reference
counted pointer. If we want to support mut variables, we probably want to use
a reference counter with copy-on-write support, so that copying is O(1) even
if the value may get mutated later. In addition, the compiler can perform data
flow analysis to avoid incrementing/decrementing the counter when possible.

Pros:

  • Minimizes cloning.

Cons:

  • Wastes memory and CPU to store and update a reference counter with each
    object.

  • Even small objects must be heap-allocated. This alone can kill the
    performance.

Design 2: Let the user decide

With this approach the user annotates each type to tell the compiler whether
they want it wrapped in a reference counter or not. In Swift, for example, the
class keyword generates a reference-counted type, while structs and tuples are
not reference counted.

Pros:

  • The user can make optimial decisions based on domain-specific knowledge that
    is not available to the compiler.

Cons:

  • In practice, most users won't know how to decide.

Design 3: Let the compiler decide

The compiler can decide which types to wrap in a reference counter, e.g., based
on their size.

Pros:

  • Improved performance compared to Design 1.

Cons:

  • The compiler doesn't have enough information to make an optimal decision.
  • The API between native Rust and DDlog code becomes unstable, as the compiler
    can arbitrarily wrap types in an Arc.

Design 4: Keep reference counters inside libraries

A copy-on-write reference counter is really just one example of a functional
data structure. Functional languages typically come with libraries that
implement many others, e.g., functional maps, sets, heaps, lists.

In this design the compiler does not inject (or even aware of) reference
counters. User-defined types translate directly into equivalent Rust structs,
enums, or tuples. The compiler uses a combination of pass-by-reference
(borrowing), ownership transfer, and cloning to maintain the pass-by-value
semantics. Library data types, especially container types, internally use
functional data structure to reduce the amortized cost of cloning and
modification operations.

Pros:

  • Simpler compiler design.
  • Generated Rust type are near-identical to DDlog declarations, so no surprises
    for people implementing extern functions.

Cons:

  • User-defined types, e.g., structs, are not reference counted and will end up
    getting cloned when the compiler needs to copy the value (i.e., it cannot
    simulate pass-by-value by borrowing or ownership transfer). This can be
    expensive, especially for structs with many fields.

Compiler design

All alternatives outlined above require an expression compiler that can maintain
pass-by-value semantics while minimizing actual cloning. In a nutshell,
an expression of the form let x = y, can be compiled to Rust in several ways:

// Option 1: Cloning: always correct, but expensive.
let x = y.clone();

// Option 2: Ownership transfer: variable `y` is no longer used.
let x = y;

// Option 3: Borrowing: `y` outlives `x`; both variables are immutable during
// the lifetime of `x`.
let x = &y;

Option 1 is always valid, but can be expensive, whereas options 2 and 3 require
dataflow analysis to establish the lifetimes and access patterns of x and y.
In general, more complex program rewriting may be required to generate optimal
code, e.g., depending on the context, the following code

let x = y;
let z = y;

may get compiled to:

let x = y;
let z = &x;

In general, even ignoring the algorithmic complexity of optimization, it may
be impossible to statically determine the best implementation.

Optimizations cannot always be applied across function boundaries. For example,
passing input arguments by reference and returning results by value is a
reasonable default calling convention. However, there are cases when
passing by value or returning by reference makes more sense:

// Example: `f1` needs an owned copy of `x`, so `x` should be passed by value
// to avoid cloning.
fn f1(x: LargeType) -> SomeStruct {
    SomeStruct {
        x: x
    }
}

// Example: `f2` returns a field of its argument `s`, and can return
// by-reference to avoid cloning.
fn f2(s: SomeStruct) -> LargeType {
    s.x
}

In this example, the compiler has sufficient information to derive these
optimizations, but if functions f1 and f2 are extern or are declared
without an implementation as part of a trait, then the compiler must use
default calling conventions or rely on user annotations.

@ryzhyk ryzhyk added the rfc Request for Comments label Sep 20, 2021
@mihaibudiu
Copy link

Our task is a bit simpler than other languages since we don't have "new".
If we give up on multithreading does this simplify the problem? Is Rc substantially more efficient than Arc?

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Sep 20, 2021

Our task is a bit simpler than other languages since we don't have "new".

Right, all heap allocation happens inside libraries. This is the same in Rust. Not sure what the implications are.

If we give up on multithreading does this simplify the problem? Is Rc substantially more efficient than Arc?

This is a bit tricky. We are single-threaded in the sense that all "our" code runs in the context of a worker thread and doesn't share state with other workers. But if we use a functional map that internally implements lazy copying, it must be implemented using Arc's, as logical clones of this map (which share state internally) can be accessed to multiple workers.

@mihaibudiu
Copy link

But what if we give up altogether using multiple workers (when not targeting DD)? Assuming all work in DDlog is done on the calling thread.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Sep 20, 2021

But what if we give up altogether using multiple workers (when not targeting DD)? Assuming all work in DDlog is done on the calling thread.

Yeah, I'm not sure. Theoretically, Arc is more expensive as it uses atomic instructions.

@silvanshade
Copy link

It might be worth looking at something like symmetric interaction calculus, or some of the later variants the author has developed, as a potential model. It has the advantage of not needing GC, has efficient parallel reduction semantics, etc., but there are some interesting design constraints. See the points mentioned in the README.

There's also several blog posts from the author on developing that further (later variants of that are used in a fintech computation engine of some sort, but I don't remember the details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Request for Comments
Projects
None yet
Development

No branches or pull requests

3 participants