Move core to workspace crate: rune-core #47

Qkessler · 2023-12-09T10:18:38Z

This is the issue tracking my work to move the core out to the workspace crate rune-core. I have the code in my fork, almost ready to PR, though I had some issues with Miri. I wanted to create the issue to get some ideas over ways to resolve the problems, and introduce @CeleritasCelery to the different changes on my PR first to get some early feedback. Here's the commit.

It's key to note that in the process, we include different changes that guide the build process of the crates, in order
to arrive at a place that is both ergonomic and easy to maintain:

We removed the defvar, defvar_bool and defsym macros. They were not adding that much regarding work saved (just
create a constant and initialize it) and they were convoluting the dependency graph between the crates. It's key that
the core crate can use the symbols that were previously generated on the build.rs file, but it does not need to know the
defuns.

For that, we moved the variables and symbols (not the defun symbols) to a static file env/sym.rs. That file contains
all the required constants for the variables and symbols.

// Symbols without the defuns
pub static BUILTIN_SYMBOLS: [SymbolCell; 84] = [
    SymbolCell::new_const("nil"),
    SymbolCell::new_const("t"),
    SymbolCell::new(":test"),
    SymbolCell::new(":documentation"),
    SymbolCell::new("md5"),
    SymbolCell::new("sha1"),
    SymbolCell::new("sha224"),
    SymbolCell::new("sha256"),
    ...
];

// also include init_variables.

On the other hand, the defun symbols and constants are still generated with the build script, to avoid loosing the
convenience of defining the functions with the defun proc macro.

/// Contains all the dynamically generated symbols for `#[defun]` marked functions.
pub(super) static DEFUN_SYMBOLS: [SymbolCell; 202] = [
    SymbolCell::new("floor"),
    SymbolCell::new("float"),
    SymbolCell::new("make-keymap"),
    SymbolCell::new("make-sparse-keymap"),
    SymbolCell::new("use-global-map"),
    SymbolCell::new("set-keymap-parent"),
    SymbolCell::new("define-key"),
    SymbolCell::new("message"),
    SymbolCell::new("format"),
    SymbolCell::new("format-message"),
    ...
];

// init_defun now does not need to take a range on the array.
pub(crate) fn init_defun() {
    for (sym, func) in DEFUN_SYMBOLS.iter().zip(SUBR_DEFS.iter()) {
        unsafe { sym.set_func((*func).into()).unwrap(); }
    }
}

The issue is that Symbol::get makes a ptr dereference over the BUILTIN_SYMBOLS (which now don't include the defuns) so if we try to get a defun symbol, we UB. Here's the test triggering it:

#[test]
fn test_bytecode_call() {
    use OpCode::*;
    let roots = &RootSet::default();
    let cx = &mut Context::new(roots);
    defun::init_defun();

    // (lambda (x) (symbol-name x))
    make_bytecode!(
        bytecode,
        257,
        [Constant0, StackRef1, Call1, Return],
        [defun::SYMBOL_NAME],
        cx
    );
    check_bytecode!(bytecode, [defun::SYMBOL_NAME], "symbol-name", cx);
}

There we check the bytecode for defun::SYMBOL_NAME, which is no longer part of BUILTIN_SYMBOLS. What do you think? What could potentially be a solution?

The text was updated successfully, but these errors were encountered:

CeleritasCelery · 2023-12-09T17:35:10Z

Thanks for looking at this. There are a few issues I see with splitting the symbols:

First, since we are overlapping symbol values they are no longer unique. Both floor and nil are initialized with Symbol::new_builtin(0); meaning that symbols are not longer unique identifiers. If you get the value 0, you can't distinguish which symbol it was. If I do cargo run --release the program fails with a bus error, meaning that we triggered UB somewhere.

Second is the issue you saw with pointer provenance in miri. This is related to symbols no longer being unique. Since symbols are just integers, we need to give them provenance before they can be used. But now we don't know which provenance to restore them to, since we have two unique ranges (BUILTIN_SYMBOLS and DEFUN_SYMBOLS).

Last is that I think this is a less than ideal mental model for symbols. Fundamentally there is no difference between "defun" symbols and "variable" symbols. All elisp symbols can hold both a function and value. The only difference in this case is their initial values. it feels weird to have this distinction.

As far as ergonomics go, I see a few issues:

Each new non-defun symbol needs to be added in three separate places instead of one.
It would be really easy to mess up the ordering between the BUILTIN_SYMBOLS and the constant declarations. If someone puts one in the wrong order that will create some very tricky bugs.
It feels like an abstraction breach to split the symbols across crates. Often when I am implementing a defun, I will see that it looks at certain symbol values to determine behavior. So I will add a defvar! or defsym! near the defun to declare them. But with the new scheme I would instead have to go an edit a different crate every time. It seems like declarations that work together should live together.

I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions). That will also let us avoid the "two different types of symbols" problem. I am not necessarily opposed to getting rid of defvar! and defsym! and just putting them in a giant manual list, but we should only need to specify a symbol once, not three times. This will probably require macros.

Qkessler · 2023-12-09T18:56:23Z

Thanks for the feedback, I agree with your points. I think the suggestion for splitting them comes from the symbols being consumed on the rune-core crate, even though probably that's something that can change and we could potentially move that code to the main package (haven't seen how, will investigate that next).

I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions).

Right, that makes sense, looking into that next.

Qkessler · 2023-12-10T06:47:54Z

I'm good with keeping the symbols together, it makes sense. What I'm thinking is that if we were to move all the symbols back to the main crate (rune) we would tie all the files that use them to the main crate (if we don't refactor again). Specially thinking about interpreter and reader. They need the symbols and it's tied to their responsability list. Nevertheless, by moving the symbols to rune, we can't really move interpreter and reader to a hypothetical rune-eval as we were discussing by email, as rune-eval can't depend on the main crate.

I'm thinking on different solutions.

Qkessler · 2023-12-10T06:58:20Z

Let's remove the separation between the symbols, so we would have everything on the build.rs. Let's think about the references of that build script on the rune-core, and maybe we remove those refs to interpreter or others (probably those will move to rune-eval in the future).

Searching for the references on upstream/master, we see there are three modules where the symbols are used in core (and probably shouldn't be):

tagged: implements IntoObject for NIL and TRUE: we can probably move that, no need to implement there (probably). The cons is that we would be moving something just because we don't have the symbols, although we could have an enum for NIL and TRUE and then convert the actual symbol when we need it (newtype pattern). REVIEW: The issue with this is that we won't have the pointers for the NIL and TRUE symbols, as they are defined in the main crate, and not core, which will lead us to probably move IntoObject implementations around.
env: We have a test that is using sym::init_symbols, but definitely something we can move + using sym::NIL.
symbol: BUILTIN_SYMBOLS directly, here.

We can't really move the symbols to rune-core, since we don't have the definition for all the functions that are generated with #[defun]. That said, we need to move stuff out of the core.

Qkessler · 2023-12-10T07:31:13Z

For the tagged issue. If we were to move the IntoObject outside the rune-core crate, say with something like this to avoid the orphan rule:

struct Bool(bool);

impl IntoObject for Bool {
    type Out<'a> = Symbol<'a>;

    fn into_obj<const C: bool>(self, _: &Block<C>) -> Gc<Self::Out<'_>> {
        let sym = match self.0 {
            true => sym::TRUE.into(),
            false => sym::NIL.into(),
        };
        unsafe { Self::Out::tag_ptr(sym.get_ptr()) }
    }
}

We would need to import the TaggedPtr trait, which is also not ideal. Looking into alternatives.

Qkessler · 2023-12-10T09:04:32Z

Also wondering whether special symbols, like true or nil (specially nil, since we use it everywhere) should be treated differently. I'm thinking whether Symbol should be placed on the main crate (for now, we can think about moving them around to another crate in the future).

Kinda ties to thinking whether Env and its corresponding symbol.rs and env.rs should be moved to the main crate (for now, again). I'll find references:

crates/rune-core/src/object/func.rs: We have the concept of BuiltInFn, which is the subr of SubrFn, so it could be hard to move around.
crates/rune-core/src/error.rs: used on signal and throw, but definitely something that we can consider moving. Not a blocker.
crates/rune-core/src/env.rs: Moved as part of the hypothesis.

Overall it kinda seems like the core was architected at the beginning without the intention of moving it. I believe the symbols are hard to place in the different crates. We should be able to do it, but it's a hard effort to reason about.

CeleritasCelery · 2023-12-10T14:31:18Z

If we look at these issues one at a time:

nil and true in tagged.rs: all symbols are represented as integers and these two are 0 and 1 respectively. We should be able to define them in both crates (in core for tagged and in the other crate where symbols are defined for BUILTIN_SYMBOLS).
Env : this can be moved out of core.
EvalError : this is only used in the interpreter, so we can move it out. TypeError can stay though.
BuiltInFn : this one might be a little more tricky, but we probably make that type take a generic type for env if we decide to move env out of core.

Qkessler · 2023-12-10T15:04:43Z

Thanks for that, really helpful. We still have the issue of the Symbol get method, using the BUILTIN_SYMBOLS!

CeleritasCelery · 2023-12-11T03:03:21Z

Hmm. That one is harder. I can’t think of a way to solve that cleanly. Will have to think about it more.

Qkessler · 2023-12-16T08:25:42Z

Hi there @CeleritasCelery. I'm thinking about this. I should be able to squeeze some time into it this weekend.

I see that you have included the stack in env as part of one of the latest commits. I'll merge and continue with my refactor.

Qkessler · 2023-12-23T16:02:26Z

Will schedule some time this week for this. I plan to continue with the core separation, although the path is harder now, since we have most of the remaining files on the core directory depending on "each other" (not really, but there are some core modules that are the most depended on, in conjunction).

Asking for some guidance regarding the remaining open questions:

What is the plan with env.rs? Do we plan to move it out of core? If so, do all files that depend on it going to need to move their Env dependant functionality out of core as well?

I'll edit this when others come to mind.

CeleritasCelery · 2023-12-23T22:35:06Z

I think we can move env.rs out of core, but we can do that later. If it is easier to leave it in for now, than leave it in. The only things that depend on it are EvalError (which can move out of core) and BuiltInFn (which we could make generic over the the env argument).

The one that I still can't seem to figure out is Symbol. The Deref implementation for it relies on BUILTIN_SYMBOLS, which will be in the main crate. I can't seem to think of a way to solve it. That is the biggest blocker I can see.

CeleritasCelery · 2023-12-24T00:27:30Z

Looks like we might be able to use extern statics to get around that issue.

Qkessler · 2024-01-04T18:39:36Z

I think we also have the opportunity to think on how we want the core to look like. If this Move the core out is taking as long as its taking, I would recommend we think on the structure we want to have.

Being concrete, it seems like we have dependencies shared across the different modules in core, even crossing. Maybe I'm reading reading something wrong, but I see that lots of the core modules (env, gc) depend on object (and its submodules), but Object is created in tagged.rs, which also depends on env, gc.

I'm missing the big picture, but probably I'll try to find some time to draw something that may be helpful here, with the current dependency graph and the modules we have.

CeleritasCelery · 2024-01-04T19:23:28Z

We can use cargo-modules to visualize the dependencies. Different modules in the core can have crossing dependencies, but what we want to avoid is having core depend on things outside of core.

Qkessler · 2024-01-04T21:31:53Z

Agree! Nevertheless, it could show that we could improve the structure, to be one-directional, which would potentially force us to declare individual modules, easier to understand and maintain! Enrique Kessler Martínez. c/ Jiménez de la espada, Cartagena, España.

…

On Thu, Jan 4, 2024 at 20:23 Troy Hinckley ***@***.***> wrote: We can use cargo-modules <https://github.com/regexident/cargo-modules> to visualize the dependencies. Different modules in the core can have crossing dependencies, but what we want to avoid is having core depend on things outside of core. — Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALN7QXF6LIN7ZMKEGKHV4QDYM36TVAVCNFSM6AAAAABANWEZUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGY2DANJQGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Qkessler mentioned this issue Dec 17, 2023

Create initial rune-core crate with only hashmap #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move core to workspace crate: rune-core #47

Move core to workspace crate: rune-core #47

Qkessler commented Dec 9, 2023 •

edited

Loading

CeleritasCelery commented Dec 9, 2023 •

edited

Loading

Qkessler commented Dec 9, 2023

Qkessler commented Dec 10, 2023

Qkessler commented Dec 10, 2023 •

edited

Loading

Qkessler commented Dec 10, 2023

Qkessler commented Dec 10, 2023 •

edited

Loading

CeleritasCelery commented Dec 10, 2023

Qkessler commented Dec 10, 2023

CeleritasCelery commented Dec 11, 2023

Qkessler commented Dec 16, 2023

Qkessler commented Dec 23, 2023

CeleritasCelery commented Dec 23, 2023

CeleritasCelery commented Dec 24, 2023

Qkessler commented Jan 4, 2024

CeleritasCelery commented Jan 4, 2024

Qkessler commented Jan 4, 2024 via email

Move core to workspace crate: rune-core #47

Move core to workspace crate: rune-core #47

Comments

Qkessler commented Dec 9, 2023 • edited Loading

CeleritasCelery commented Dec 9, 2023 • edited Loading

Qkessler commented Dec 9, 2023

Qkessler commented Dec 10, 2023

Qkessler commented Dec 10, 2023 • edited Loading

Qkessler commented Dec 10, 2023

Qkessler commented Dec 10, 2023 • edited Loading

CeleritasCelery commented Dec 10, 2023

Qkessler commented Dec 10, 2023

CeleritasCelery commented Dec 11, 2023

Qkessler commented Dec 16, 2023

Qkessler commented Dec 23, 2023

CeleritasCelery commented Dec 23, 2023

CeleritasCelery commented Dec 24, 2023

Qkessler commented Jan 4, 2024

CeleritasCelery commented Jan 4, 2024

Qkessler commented Jan 4, 2024 via email

Qkessler commented Dec 9, 2023 •

edited

Loading

CeleritasCelery commented Dec 9, 2023 •

edited

Loading

Qkessler commented Dec 10, 2023 •

edited

Loading

Qkessler commented Dec 10, 2023 •

edited

Loading