-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Embedding Datalog in a functional language #2
Comments
Summary of discussion with @Kixiron:
|
Additionally, I think the first-class relations aspect of DDlog is a really promising avenue for both simplifying the language as well as making it more powerful and approachable |
Yep, that's part of the appeal. In a way it makes the language a bit too powerful, as we can now write programs that our backend does not support, but that's a much lesser evil. |
I talked to some ddlog users and they listed a few things
|
We certainly need better optimizers for both declarative and imperative code. One thing I've learned from working with DDlog though is that oftentimes you can only get performance through macro-optimizations involving code refactoring that you cannot possibly expect the compiler to figure out. I'm just saying, realistically performance will always be a programmer's concern to a large extent. Completely agree with all other points. Also, you clearly know about more DDlog users than I do :) |
Let me move this comment to #1 |
This is because often the optimal program depends on the data that you are feeding the program. The compiler has no way of knowing that. In some remote day perhaps the program runtime will monitor its own execution and observe and report inefficiencies. |
That's part of the problem, but profile-driven optimization aside, macro-optimizations are just really hard. You are asking the compiler to synthesize an equivalent program that is more optimal in terms of some metric. This is undecidable in theory and in practice. |
Perhaps we should write these RFCs as rst files instead. Would make it easier to comment and to convert into documentation later. |
Does github support .rst in issues? I don't think it does. |
I am suggesting contributing the RFC as a document in a docs directory that we can then review using normal github tools. |
I would much prefer markdown over rst |
There is only one way to settle this: https://www.rpsgame.org/ |
Markdown integrates super easily with mdbook which would allow us one unified site for documentation and rfcs |
md is fine, the point is to have a document instead of an issue. |
Is DDlog-2 going to have original DDlog-1 syntax, or are we verging more towards the D-Rust sample language in this RFC? After inspecting LINQ, I have to say that the explicit function calls (filter, map, etc) are more accessible than the sequence of (implicit) operators in DDlog. D-Rust seems to be a nice in-between that isn't as verbose as Rust but isn't as opaque as DDlog. Having said that, I haven't talked with many DDlog users, so maybe users don't actually find this to be a problem, in which case we stick with original DDlog-1 syntax. |
@amytai, the consensus seems to be that we're moving in the D-Rust direction, specifically:
|
I like to look at it more from the angle of treating relations as first-class objects within the language that works in tandem with the more traditional Datalog syntax so that you can switch between writing imperative code and declarative logic over relations function join_neighbors(nodes: Relation[(Node, NodeData)], edges: Relation[(Node, Node)]): Relation[(NodeData, NodeData)] {
nodes
.join(edges)
.map(|(_src, (data, dest))| (dest, data))
.join(edges)
.map(|(_dest, (src_data, dest_data))| (src_data, dest_data))
}
output relation Neighbor(src: NodeData, dest: NodeData)
Neighbor(src, dest) :- let (src, dest) = join_neighbors(Nodes, Edges). |
I think we're an agreement. But I think it is nice that relational stuff can be expressed in the purely functional subset of the language after desugaring (even though the compiler will handle functions on relations differently). |
PS. In the |
I disagree with this statement. It is harder to understand because the column names are meaningless. I think that the join in Datalog is in fact easier to get wrong; if you transpose the column names in some relations no one will even notice. I believe that the LINQ-like syntax is great. If there is a systematic way to translate Datalog into LINQ we should do it. |
Speaking of this, I would very much like a design of the compiler that makes is easy to insert passes like this - desugaring passes, optimization passes. This makes the compiler extensibility and incremental development much simpler. Perhaps we should start a new issue about the compiler architecture? |
I agree, we should start with the general-purpose subset. It's not really LINQ though, just a general-purpose functional language. It is not true in my opinion that LINQ is always better than Datalog. Long sequences of joins are tedious and error-prone. Like Datalog, LINQ does not enforce consistent naming of columns, but also forces you to manually form tuples and then manually bind tuple fields to variables. |
Ah cool I hadn't thought of this. So then Datalog-syntax will be syntactic sugar that should be expressible via #3 and common LINQ-style operators such as join, filter will also (?) be syntactic sugar (i.e. non-UDF relational operators exposed via some std lib). |
Yes, that's right, the canonical syntax is the LINQ-style, since it is unambiguous. The others are just syntactic sugar. |
RFC: Embedding Datalog in a functional language
This RFC sketches a Datalog dialect implemented as syntactic sugar on top of a general-purpose functional language (e.g., Rust). It strives to combine the expressiveness and flexibility of a general-purpose language with conciseness and clarity of Datalog.
LINQ and Differential Dataflow are two examples of general-purpose languages that support declarative queries over relational data. To this end, they implement all standard relational operators as higher-order functions (
filter
,map
,join
,group
, etc.) This design has many important advantages:Flattens the learning curve by relying on familiar programming constructs.
Unifies relational and imperative subsets of the language. Relations are first-class objects and can be passed around as regular values. In particular, relation transformers can be implemented as regular functions.
Takes advantage of existing language infrastructure: type system, expression syntax, parser, type checker, IDE integration and other tooling, documentation, etc.
The main downside of the approach is that it can be much more verbose than Datalog, as one must supply closures to each operator to unwrap the input record, bind its columns to local variables, and wrap the result of the operator back into a struct or tuple. Here is the same rule written in DDlog and in a Rust-like language:
Clearly, the second version is harder to understand and easier to get wrong.
We propose a syntactic sugar on top of the functional syntax, which eliminates this overhead while keeping the above advantages. The idea is that instead of writing closures explicitly, the programmer writes patterns that contain variable bindings, and the compiler automatically derives closures that produce and consume tuples containing bound variables. For example, the above rule would be written as:
which translates to the Rust code above. Note that this rule is syntactically close to Datalog.
In the rest of this RFC, we define a vocabulary of relational operators and their pattern-based versions. We leave the following questions outside the scope for now:
The choice of the base language. While we can define the new language from scratch (similar to DDlog-1), it can also be designed as an extension of an existing language, e.g., Rust. There are all kinds of pros and cons to both options and this choice will have major impact on the structure of the project.
Support subset of the language. The exact subset of the language that can be compiled into efficient incremental code is limited by the capabilities of the compiler and the backend. For example, even though we can manipulate relations as values in the language, Differential Dataflow does not support this, so we may only be able to compile programs where all rules can be evaluated statically.
Compiler architecture. Depending on the answer to question 1, the compiler will be built as an extension of an existing compiler (e.g.,
rustc
), an embedded DSL (e.g., in GHC or C#), or a standalone compiler.Streaming features.
In the following sections we sketch the translation of various DDlog-1 elements to a pure functional language based on Rust (ok, it's really just sloppy Rust, where I ignore lifetimes and borrowing), and
D-Rust
, a hypothetical new language that adds syntactic shortcuts on top of Rust. These shortcuts are optional. The programmer may choose to write some or all rules using vanilla Rust.Relation declarations
DDlog
Rust
Note that DDlog combines record type declaration, relation type declaration, and relation instance declaration in a single line, while Rust requires three separate declarations. Note also that Rust doesn't require
input
/output
annotations: input relations are relations passed as arguments to themain
function, output relations are returned by the function, and intermediate relations are local variables declared inside the function. This enables modularity and reuse, as an output of one function can be used as an input to the other, which amounts to relation transformer composition.D-Rust
TODO: I am not sure how to define meaningful shorthand syntax for relation type and instance delarations.
Relational operators
We define relational operators as methods of type
Relation<>
.join
,antijoin
DDlog
Rust
D-Rust
fn main(r1: Relation, r2: Relation): (Relation, Relation)
{
let r3 = r1(var x, var y).join(r2(y, var z)) -> R3(x, z);
let r4 = r1(var x, var y).antijoin(r2(y)) -> R4(x);
}
map
,filter
DDlog
Rust
D-Rust
Here I use
bind
as a more intuitive synonym formap
.group_by
DDlog
Rust
D-Rust
flatten
DDlog
Rust
D-Rust
Union, recursion
DDlog
Rust
D-Rust
We may want the compiler to detect recursive components automatically (Datalog-style). Although there is also a lot to be said for forcing the programmer to be explicit about recursion.
The text was updated successfully, but these errors were encountered: