Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why Copy is required for a type specified %parse-param #404

Open
FranklinChen opened this issue May 20, 2023 · 13 comments
Open

Explain why Copy is required for a type specified %parse-param #404

FranklinChen opened this issue May 20, 2023 · 13 comments

Comments

@FranklinChen
Copy link

It might be useful for the book to explain briefly why Copy is required for the type specified in %parse-param. This restriction has led me to have to pass in mutable state using a layer of indirection with &RefCell<State> and repeatedly using borrow_mut() in actions, so I wonder if this is really necessary.

@ratmice
Copy link
Collaborator

ratmice commented May 20, 2023

It is theoretically possible that it could be relaxed such that %parse-param could allow of an &mut reference, but it is exceedingly difficult. With the main reason being the unique ownership of an &mut, the type for action functions needs to be changed to a type of function which cannot capture ownership of the &mut (so that when the reference passed to the action falls out of scope it can be passed to the next action). Rust can type such functions as an HRTB/Higher ranked lifetimes using for <'a>.

One of the things that makes this difficult may be the pattern matching involved in generated code, and the difference between FunctionParamPattern and Pattenrs.

Copy patterns are fairly easy to implement without actually parsing the %parse-param using the argument in the generated code verbatim. There were a lot of attempts to get mutable references working in #214
But I kind of ran out of steam on that, and the RefCell workaround.

@ltratt
Copy link
Member

ltratt commented May 20, 2023

It would be nice if it supported &mut but as @ratmice said, I'm not sure how practical that is. One of the challenges of very-clever type systems is that it can be hard to experiment on existing codebases, and we've definitely encountered that challenge!

Another possibility for your case might be to use interior mutation: so you could pass a type T: Copy around which internally has a RefCell or whatnot that hides some of the horrors of borrow_mut. It's still not completely ideal, of course.

@FranklinChen
Copy link
Author

I guess that lalrpop (which I was using before) doesn't have this problem for its parameter passing mechanics https://lalrpop.github.io/lalrpop/tutorial/009_state_parameter.html because it generates a parser struct for each exported rule, whereas lrpar generates static functions. In lalrpop, I passed mutable state using the directive

grammar<'extra>(state: &'extra mut Vec<u8>);

whereas for lrpar I am using

%parse-param state: &RefCell<Vec<u8>>

@ltratt
Copy link
Member

ltratt commented May 20, 2023

To some extent, the current design reflects the fact that I always write parsers that bubble state up rather than mutate state: honestly, it didn't really occur to me to deal with mutable state! Could lrpar be adapted to deal with mutable state? I guess it probably could. I must admit that I don't think I'll be the person who does that though :/ Sorry, that's not a very satisfactory answer on my part!

@FranklinChen
Copy link
Author

I personally prefer to program purely functionally, but for performance I've been collecting some things during parsing mutably, which is very cheap when using a Vec and simply pushing to it. I've considered passing everything upward instead, at the expense of basically manually threading state through everything, and using Rust doubly LinkedList to collect things recursively without paying a quadratic concatenation penalty. I haven't done that yet for comparative benchmarking purposes.

The other thing I want to do, where mutable state seems particularly sensible, is to catch semantic errors during parsing and log error objects into a Vec before doing recovery and going on, since I don't want to fail fast in parsing but want to generate as many errors as possible for the end user.

@ratmice
Copy link
Collaborator

ratmice commented May 20, 2023

I don't mind having another look at it, though I can't promise I'll be any more successful than last time I attempted to do so.
I think that the %parse-param work that made it's way in tree may have simplified things.

The primary difference being in these 2 commits, where we went from accepting multiple named arguments to a single named argument.
3298c50
#216

So, if we can in fact go with the simpler single named argument approach it may well avoid a lot of the difficulty I encountered in my previous attempt. As such I think there is reason to hope that much of the difficulty I'd previously encountered can be avoided.

@ltratt
Copy link
Member

ltratt commented May 21, 2023

If it is doable, that would be great!

@Cypher1
Copy link

Cypher1 commented Sep 28, 2024

This would be handy. It wasn't clear to me if this meant that the state data might need to deal with backtracking, though I had previously been sure it wasn't necessary.

My solution was the following

%start Expr
%parse-param ctx: *mut crate::parser::Ast
%%

Then in the code area:

fn with(ctx: *mut crate::parser::Ast) -> &'static mut Ast {
    // TODO: Replace with this something safer... but... for now
    // the context is guaranteed to live long enough and not be moved.
    unsafe {&mut (*ctx)}
 }

This let's actions use the ctx without a copy or a check, and makes the use of slab allocation simple enough.
I also found it convenient to implement Into for spans to shrink them to a smaller type (as I don't expect my language to need to handle any files > 2^16 bytes long).

     let op = with(ctx).add_op(Op{
          op: Symbol::Try,
          args: smallvec![$1?]
        },
        $span.into()
      );
     Ok(op)

@ltratt
Copy link
Member

ltratt commented Sep 28, 2024

In general I'm leary about passing *mut around because you can easily violate Rust's semantics. Well, in a sense, I just lied: Rust doesn't yet have a proper semantics around derefing pointers! Stacked borrows and its successors are the current best guide but they're not yet formally blessed.

I do agree, though, that Copy is a tough restriction to deal with. Relaxing that restriction would be nice.

@ratmice
Copy link
Collaborator

ratmice commented Sep 28, 2024

The feeling I get is that we just can't have both &mut, and T where T: Copy using the same API, this is just a part of rusts Mutable XOR Shared semantics. So I think we would need to add additional, or different entry points for runtime parsers and compile-time parsers, and have separate entry points for params with copy bounds (which should also work for references which are Copy) and those with mut references.

So I'm not sure that we can really relax it in any way that could actually support both Copy and mutable. So much as either switch from Copy to mutable, or alternately provide separate entry points %parse_param_mut/parse_mut which would then generate either a parse function, or a parse_mut function (which would not be available when using RTParserBuilder probably.

So I think there are at least some decisions we can think about even before we try to relax that restriction which affect how we would want to approach it.

@ltratt
Copy link
Member

ltratt commented Sep 29, 2024

I wonder if Clone as a bound would be possible? That would make it possible to e.g. pass things like Rc<RefCell<..>> in.

@ratmice
Copy link
Collaborator

ratmice commented Sep 29, 2024

I think it should be possible, we just need to add the explicit calls to clone, and change the bounds,
I think this would even be backwards compatible because the Copy trait inherits Clone here?

@ltratt
Copy link
Member

ltratt commented Sep 29, 2024

I think this would even be backwards compatible because the Copy trait inherits Clone here?

I think you're right!

Maybe this isn't too difficult a change. I'm not sure I'll get to it very soon, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants