Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move core to workspace crate: rune-core #47

Open
Qkessler opened this issue Dec 9, 2023 · 16 comments
Open

Move core to workspace crate: rune-core #47

Qkessler opened this issue Dec 9, 2023 · 16 comments

Comments

@Qkessler
Copy link
Contributor

Qkessler commented Dec 9, 2023

This is the issue tracking my work to move the core out to the workspace crate rune-core. I have the code in my fork, almost ready to PR, though I had some issues with Miri. I wanted to create the issue to get some ideas over ways to resolve the problems, and introduce @CeleritasCelery to the different changes on my PR first to get some early feedback. Here's the commit.

It's key to note that in the process, we include different changes that guide the build process of the crates, in order
to arrive at a place that is both ergonomic and easy to maintain:

We removed the defvar, defvar_bool and defsym macros. They were not adding that much regarding work saved (just
create a constant and initialize it) and they were convoluting the dependency graph between the crates. It's key that
the core crate can use the symbols that were previously generated on the build.rs file, but it does not need to know the
defuns.

For that, we moved the variables and symbols (not the defun symbols) to a static file env/sym.rs. That file contains
all the required constants for the variables and symbols.

// Symbols without the defuns
pub static BUILTIN_SYMBOLS: [SymbolCell; 84] = [
    SymbolCell::new_const("nil"),
    SymbolCell::new_const("t"),
    SymbolCell::new(":test"),
    SymbolCell::new(":documentation"),
    SymbolCell::new("md5"),
    SymbolCell::new("sha1"),
    SymbolCell::new("sha224"),
    SymbolCell::new("sha256"),
    ...
];

// also include init_variables.

On the other hand, the defun symbols and constants are still generated with the build script, to avoid loosing the
convenience of defining the functions with the defun proc macro.

/// Contains all the dynamically generated symbols for `#[defun]` marked functions.
pub(super) static DEFUN_SYMBOLS: [SymbolCell; 202] = [
    SymbolCell::new("floor"),
    SymbolCell::new("float"),
    SymbolCell::new("make-keymap"),
    SymbolCell::new("make-sparse-keymap"),
    SymbolCell::new("use-global-map"),
    SymbolCell::new("set-keymap-parent"),
    SymbolCell::new("define-key"),
    SymbolCell::new("message"),
    SymbolCell::new("format"),
    SymbolCell::new("format-message"),
    ...
];

// init_defun now does not need to take a range on the array.
pub(crate) fn init_defun() {
    for (sym, func) in DEFUN_SYMBOLS.iter().zip(SUBR_DEFS.iter()) {
        unsafe { sym.set_func((*func).into()).unwrap(); }
    }
}

The issue is that Symbol::get makes a ptr dereference over the BUILTIN_SYMBOLS (which now don't include the defuns) so if we try to get a defun symbol, we UB. Here's the test triggering it:

#[test]
fn test_bytecode_call() {
    use OpCode::*;
    let roots = &RootSet::default();
    let cx = &mut Context::new(roots);
    defun::init_defun();

    // (lambda (x) (symbol-name x))
    make_bytecode!(
        bytecode,
        257,
        [Constant0, StackRef1, Call1, Return],
        [defun::SYMBOL_NAME],
        cx
    );
    check_bytecode!(bytecode, [defun::SYMBOL_NAME], "symbol-name", cx);
}

There we check the bytecode for defun::SYMBOL_NAME, which is no longer part of BUILTIN_SYMBOLS. What do you think? What could potentially be a solution?

@CeleritasCelery
Copy link
Owner

CeleritasCelery commented Dec 9, 2023

Thanks for looking at this. There are a few issues I see with splitting the symbols:

First, since we are overlapping symbol values they are no longer unique. Both floor and nil are initialized with Symbol::new_builtin(0); meaning that symbols are not longer unique identifiers. If you get the value 0, you can't distinguish which symbol it was. If I do cargo run --release the program fails with a bus error, meaning that we triggered UB somewhere.

Second is the issue you saw with pointer provenance in miri. This is related to symbols no longer being unique. Since symbols are just integers, we need to give them provenance before they can be used. But now we don't know which provenance to restore them to, since we have two unique ranges (BUILTIN_SYMBOLS and DEFUN_SYMBOLS).

Last is that I think this is a less than ideal mental model for symbols. Fundamentally there is no difference between "defun" symbols and "variable" symbols. All elisp symbols can hold both a function and value. The only difference in this case is their initial values. it feels weird to have this distinction.

As far as ergonomics go, I see a few issues:

  1. Each new non-defun symbol needs to be added in three separate places instead of one.
  2. It would be really easy to mess up the ordering between the BUILTIN_SYMBOLS and the constant declarations. If someone puts one in the wrong order that will create some very tricky bugs.
  3. It feels like an abstraction breach to split the symbols across crates. Often when I am implementing a defun, I will see that it looks at certain symbol values to determine behavior. So I will add a defvar! or defsym! near the defun to declare them. But with the new scheme I would instead have to go an edit a different crate every time. It seems like declarations that work together should live together.

I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions). That will also let us avoid the "two different types of symbols" problem. I am not necessarily opposed to getting rid of defvar! and defsym! and just putting them in a giant manual list, but we should only need to specify a symbol once, not three times. This will probably require macros.

@Qkessler
Copy link
Contributor Author

Qkessler commented Dec 9, 2023

Thanks for the feedback, I agree with your points. I think the suggestion for splitting them comes from the symbols being consumed on the rune-core crate, even though probably that's something that can change and we could potentially move that code to the main package (haven't seen how, will investigate that next).

I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions).

Right, that makes sense, looking into that next.

@Qkessler
Copy link
Contributor Author

I'm good with keeping the symbols together, it makes sense. What I'm thinking is that if we were to move all the symbols back to the main crate (rune) we would tie all the files that use them to the main crate (if we don't refactor again). Specially thinking about interpreter and reader. They need the symbols and it's tied to their responsability list. Nevertheless, by moving the symbols to rune, we can't really move interpreter and reader to a hypothetical rune-eval as we were discussing by email, as rune-eval can't depend on the main crate.

I'm thinking on different solutions.

@Qkessler
Copy link
Contributor Author

Qkessler commented Dec 10, 2023

Let's remove the separation between the symbols, so we would have everything on the build.rs. Let's think about the references of that build script on the rune-core, and maybe we remove those refs to interpreter or others (probably those will move to rune-eval in the future).

Searching for the references on upstream/master, we see there are three modules where the symbols are used in core (and probably shouldn't be):

  • tagged: implements IntoObject for NIL and TRUE: we can probably move that, no need to implement there (probably). The cons is that we would be moving something just because we don't have the symbols, although we could have an enum for NIL and TRUE and then convert the actual symbol when we need it (newtype pattern). REVIEW: The issue with this is that we won't have the pointers for the NIL and TRUE symbols, as they are defined in the main crate, and not core, which will lead us to probably move IntoObject implementations around.
  • env: We have a test that is using sym::init_symbols, but definitely something we can move + using sym::NIL.
  • symbol: BUILTIN_SYMBOLS directly, here.

We can't really move the symbols to rune-core, since we don't have the definition for all the functions that are generated with #[defun]. That said, we need to move stuff out of the core.

@Qkessler
Copy link
Contributor Author

For the tagged issue. If we were to move the IntoObject outside the rune-core crate, say with something like this to avoid the orphan rule:

struct Bool(bool);

impl IntoObject for Bool {
    type Out<'a> = Symbol<'a>;

    fn into_obj<const C: bool>(self, _: &Block<C>) -> Gc<Self::Out<'_>> {
        let sym = match self.0 {
            true => sym::TRUE.into(),
            false => sym::NIL.into(),
        };
        unsafe { Self::Out::tag_ptr(sym.get_ptr()) }
    }
}

We would need to import the TaggedPtr trait, which is also not ideal. Looking into alternatives.

@Qkessler
Copy link
Contributor Author

Qkessler commented Dec 10, 2023

Also wondering whether special symbols, like true or nil (specially nil, since we use it everywhere) should be treated differently. I'm thinking whether Symbol should be placed on the main crate (for now, we can think about moving them around to another crate in the future).

Kinda ties to thinking whether Env and its corresponding symbol.rs and env.rs should be moved to the main crate (for now, again). I'll find references:

  • crates/rune-core/src/object/func.rs: We have the concept of BuiltInFn, which is the subr of SubrFn, so it could be hard to move around.
  • crates/rune-core/src/error.rs: used on signal and throw, but definitely something that we can consider moving. Not a blocker.
  • crates/rune-core/src/env.rs: Moved as part of the hypothesis.

Overall it kinda seems like the core was architected at the beginning without the intention of moving it. I believe the symbols are hard to place in the different crates. We should be able to do it, but it's a hard effort to reason about.

@CeleritasCelery
Copy link
Owner

If we look at these issues one at a time:

  1. nil and true in tagged.rs: all symbols are represented as integers and these two are 0 and 1 respectively. We should be able to define them in both crates (in core for tagged and in the other crate where symbols are defined for BUILTIN_SYMBOLS).
  2. Env : this can be moved out of core.
  3. EvalError : this is only used in the interpreter, so we can move it out. TypeError can stay though.
  4. BuiltInFn : this one might be a little more tricky, but we probably make that type take a generic type for env if we decide to move env out of core.

@Qkessler
Copy link
Contributor Author

Thanks for that, really helpful. We still have the issue of the Symbol get method, using the BUILTIN_SYMBOLS!

@CeleritasCelery
Copy link
Owner

Hmm. That one is harder. I can’t think of a way to solve that cleanly. Will have to think about it more.

@Qkessler
Copy link
Contributor Author

Hi there @CeleritasCelery. I'm thinking about this. I should be able to squeeze some time into it this weekend.

I see that you have included the stack in env as part of one of the latest commits. I'll merge and continue with my refactor.

@Qkessler
Copy link
Contributor Author

Will schedule some time this week for this. I plan to continue with the core separation, although the path is harder now, since we have most of the remaining files on the core directory depending on "each other" (not really, but there are some core modules that are the most depended on, in conjunction).

Asking for some guidance regarding the remaining open questions:

  • What is the plan with env.rs? Do we plan to move it out of core? If so, do all files that depend on it going to need to move their Env dependant functionality out of core as well?

I'll edit this when others come to mind.

@CeleritasCelery
Copy link
Owner

I think we can move env.rs out of core, but we can do that later. If it is easier to leave it in for now, than leave it in. The only things that depend on it are EvalError (which can move out of core) and BuiltInFn (which we could make generic over the the env argument).

The one that I still can't seem to figure out is Symbol. The Deref implementation for it relies on BUILTIN_SYMBOLS, which will be in the main crate. I can't seem to think of a way to solve it. That is the biggest blocker I can see.

@CeleritasCelery
Copy link
Owner

Looks like we might be able to use extern statics to get around that issue.

@Qkessler
Copy link
Contributor Author

Qkessler commented Jan 4, 2024

I think we also have the opportunity to think on how we want the core to look like. If this Move the core out is taking as long as its taking, I would recommend we think on the structure we want to have.

Being concrete, it seems like we have dependencies shared across the different modules in core, even crossing. Maybe I'm reading reading something wrong, but I see that lots of the core modules (env, gc) depend on object (and its submodules), but Object is created in tagged.rs, which also depends on env, gc.

I'm missing the big picture, but probably I'll try to find some time to draw something that may be helpful here, with the current dependency graph and the modules we have.

@CeleritasCelery
Copy link
Owner

We can use cargo-modules to visualize the dependencies. Different modules in the core can have crossing dependencies, but what we want to avoid is having core depend on things outside of core.

@Qkessler
Copy link
Contributor Author

Qkessler commented Jan 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants