Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] DDlog Packages and Rust integration #10

Open
ryzhyk opened this issue Oct 6, 2021 · 3 comments
Open

[RFC] DDlog Packages and Rust integration #10

ryzhyk opened this issue Oct 6, 2021 · 3 comments
Labels
rfc Request for Comments

Comments

@ryzhyk
Copy link
Contributor

ryzhyk commented Oct 6, 2021

[RFC] DDlog Packages and Rust integration

This RFC proposes a systematic way to organize DDlog code into packages and map
these packages into Rust crates. As part of this, we propose a design for
integrating native Rust code into a DDlog project.

Motivation

We address the following limitations of DDlog-1:

  • DDlog-1 does not have the notion of a project or package. Starting with the
    user-specified top module, the compiler finds all its transitive dependencies and
    generates a Rust project by automatically partitioning these dependencies into
    crates based on heuristics. This approach has proved flawed in multiple ways.

    • Suboptimal crate structure (with too many of too few crates).
    • Configuration options, e.g., library paths, must be specified as CLI arguments
      to the DDlog compiler.
    • The lack of a project metadata file complicates IDE integration.
    • It is near-impossible to write native Rust code that depends on definitions
      from other modules, as these modules may end up in arbitrary crates.
    • It is hard to distribute DDlog libraries
    • Managing dependencies on extern Rust crates is tricky. One must make sure
      that at most one DDlog module imports each Rust crate in its module.toml
      file.
  • Native Rust code is not self-contained. A .dl module can be accompanied by
    a .rs file that implements extern functions and types declared in this module.
    The contents of the .rs file is concatenated with DDlog-generated Rust code and
    copied to the generated Rust project. This leads to poor ergonomics, as
    developers cannot use standard Rust development tools to write these files.
    Moreover, Rust compiler errors point to locations in the generated file that must be
    manually mapped back to the original Rust code.

Packages

DDlog-2 code is organized in packages. Like a Rust crate, a package
consists of a tree of modules and a metadata file specifying package
dependencies. There are two types of modules: DDlog modules and native
Rust modules. The DDlog-2 compiler converts the package into a Rust crate
by generating Cargo.toml for the package and a Rust module module.rs for
each DDlog module module.dl. Native Rust modules are included in the Rust
project as is, and in place, so that Rust compiler messages point to actual
source code locations.

Package structure

A DDlog package looks a lot like a Rust crate. The package.toml file in the
root directory contains package metadata: name, version, description, etc., path
to the main module (e.g., lib.dl), and dependencies. A package can have two
kinds of dependencies: Rust crates and other DDlog packages. The former can
point to crates.io, git repository, or local folder, the latter can initially only point
to a local folder or a git repo, but it should be possible to implement support
for both git repositories and for crates.io in the future (see below).

my_package/
├── package.toml
└── src/
    ├── lib.dl
    ├── mod1.dl
    ├── mod2.dl
    ├── mod2.rs
    └── mod3/
        └── mod.dl

A module can consist of a single file or a file tree. In the above example,
mod1.dl is a single-file DDlog module, and mod2.rs is a single-file Rust
module. mod2.dl contains DDlog bindings for Rust definitions
in mod2.rs. This file can only contain function prototypes without
implementation and type definitions (see discussion of extern types below).
Ideally, this file should be generated automatically from mod2.rs, but we may
want to leave this for future work.

Similar to lib and bin crates in Rust, we may want to distinguish library
packages and executable packages, where only the latter can be used to
instantiate a dataflow.

Generated code structure

In contrast to DDlog-1, we place the generated Rust code under the package directory.
We generate Cargo.toml in the top-level folder. Each .dl module is compiled to a
Rust module and stored in the src_rs directory that mirrors the module structure
of the DDlog package. Native Rust modules remain unmodified at their original location.
Rust's #[path] attribute is used to link the native modules to the generated
Rust project:

my_package/
├── Cargo.toml
├── package.toml
├── src/
│   ├── lib.dl
│   ├── mod1.dl
│   ├── mod2.dl
│   ├── mod2.rs
│   └── mod3/
│       └── mod.dl
└── src_rs/
    ├── lib.rs
    ├── mod1.rs
    └── mod3/
        └── mod.rs

Native types and functions

As discussed above, a DDlog package can contain native modules implemented in Rust.
A native module is accompanied by a .dl file that declares DDlog bindings for
types, functions, and trait implementations exported by the Rust module, e.g.,

/// Function signature (no implementation).
pub fn f<T: Ord>(arg: T) -> bool;

/// `impl` block consisting of function signatures only.
impl MyStruct {
    fn f1(self) -> bool;
}

/// Trait `impl` without a body.
impl Ord for MyStruct;

/// OpaqueType can be declared in Rust as a struct, enum, or alias.
type OpaqueType;

We sometimes want to expose Rust types like Option and Result to DDlog not as
opaque types but as structs or enums with constructors. In DDlog-1 we did so by
re-declaring the types in DDlog and implementing conversion functions to/from
std and ddlog_std versions of the type. This did not exactly improve Rust
developers' experience.

We therefore propose that in DDlog-2 one can use Rust structs and enums directly
by describing the structure of the struct or enum to the DDlog compiler. Consider
the following native Rust module and accompanying DDlog binding that expose the
Rust Option type to DDlog:

/// option.rs

// Re-export Rust `Option` type to DDlog
pub use std::option::Option;
/// option.dl

// Instead of declaring `Option` as an opaque type, tell DDlog about its constructors.
// DDlog knows that `module.dl` contains bindings for native Rust code in `module.rs` and
// will not generate a duplicate Rust definition for this type.
pub enum Option<T> {
    None,
    Some(T)
}

Distributing DDlog package via crates.io

One advantage of the proposed design is that DDlog packages map directly to Rust crates and
can be distributed as such. This has dual benefits: DDlog developers can use the crates
ecosystem for software distribution and dependency management; conversely, Rust developers
can easily incorporate DDlog libraries in their programs. Since crates.io doesn't support DDlog
package file format, one must include the generated Cargo.toml file in the distribution.
The next question is whether we need to distribute generated Rust sources along with .dl
files. This is probably undesirable and can be avoided by using a build.rs that invokes
the DDlog compiler to convert DDlog sources to Rust at build time.

@ryzhyk ryzhyk added the rfc Request for Comments label Oct 6, 2021
@mihaibudiu
Copy link

Frankly this proposal is great. If you can achieve all this, it's fabulous.
But are there some obstacles which could make this difficult? You mentioned circular dependencies between modules.
Do we have workarounds for such obstacles?
Do we need circular module dependencies?

@mihaibudiu
Copy link

Adding traits to existing types is another one that comes to mind.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Oct 7, 2021

You mentioned circular dependencies between modules.

Circular module dependencies are not a problem, as they are allowed by Rust.
Circular crate dependencies on the other hand are illegal. Currently the compiler automatically splits the module dependency graph into SCCs to form crates. In DDlog-2 the programmer will be responsible for this, which I think is better in practice.

Adding traits to existing types is another one that comes to mind.

That's a good point. This will still require wrapper types, unless you control either the crate that declares the type of the crate that declares the trait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Request for Comments
Projects
None yet
Development

No branches or pull requests

2 participants