Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support left recursion syntax #533

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Abdillah
Copy link

@Abdillah Abdillah commented Aug 21, 2021

Add support to left recursion PEG by adding marker syntax.
Inspired by this awesome paper https://www.sciencedirect.com/science/article/pii/S0167642314000288.

Background

Pest's PEG implementation not supporting left recursive syntax. So, anything that repeated on the left-most of a sequence will trigger the validation as semantic error. As a result, our syntax definition needs to be adjusted using many ways. But, there are cases where workaround is not proper at all (at least as far as I know). For example on language with first-class function.

function_call = { (function_call | symbol) ~ argument_list }

When using pest, we need to adjust the syntax as to match,

function_call = { symbol ~ argument_list+ }

which will cause the resulting AST far from expectation.

Changes

Supporting left-recursive syntax is explained in more details in the paper, but I want to iterate my implementation.

This PR adds recursion consume handler, rule modifier syntax, and code generation between them. The recursion consume handler is basically testing the input against the rule, if the rule checks out it will try more deeper into the recursion. If the rule at last failed, it will returns the last result from the trial. In addition, this handler also protects against infinite recursion.

Concern

The implemented recursion handler is a trial-and-error loop on the level of the recursion. So, it's violating PEG non-backtracking principle. In my opinion, because we mark the rule as recursive, this makes the user aware of this feature and maybe its consequences on performance (if any).

Checklist

  • Add single rule recursive processing.
  • Add single rule parser tests.
  • Add multiple rule recursive processing.
  • Add recursive marker syntax.
  • Document the syntax.

@Abdillah
Copy link
Author

There is still some problem about the loop, sometime it loops > 255 and gets panicked.
But, for a simple syntax, it is good enough.

@Abdillah
Copy link
Author

Hi @dragostis, this PR is in concept-complete and in minimal viable state, despite–I'm sure–still not covering edge cases especially in cases' validations.

But, since this PR is actually violating PEG principle "non-backtracking", I want to know what's your opinion as, I believe, this project's maintainer.

@oovm
Copy link
Contributor

oovm commented Oct 17, 2021

Any chance for merge this?

I want to try it, but I don't know how to configure

I have imported the dependence

pest = { version = "2.1", git = "https://github.com/Abdillah/pest.git" }
expr = *{
    "(" ~ expr ~ ")"
  | expr ~ ("/"|"*") ~ expr
  | expr ~ ("+"|"-") ~ expr
}
error parsing 
  --> 40:9
   |
40 | expr  = *{
   |         ^---
   |
   = expected `{`, `_`, `@`, `$`, or `!`

  --> 42:5
   |
42 |   | expr ~ ("*"|"/") ~ expr␊
   |     ^--^
   |
   = rule expr is left-recursive (expr -> expr); pest::prec_climber might be useful in this case

error: failed to run custom build command for `pest_meta v2.1.3 (https://github.com/Abdillah/pest.git?branch=leftrecursive#6390c304)`

@oovm
Copy link
Contributor

oovm commented Oct 19, 2021

It seems that two branches cannot be both rec.

program = {SOI ~ statement* ~ EOI}

statement = { expr ~ ";"? }
expr  = *{
    expr ~ "*" ~ expr
  | expr ~ "+" ~ expr
  | integer
}
integer = @{"0"|ASCII_NONZERO_DIGIT ~ ("_"? ~ ASCII_DIGIT)*}

WHITESPACE = ${WHITE_SPACE|NEWLINE}

when input 1 + 2 * 3 gives:

attempt to add with overflow
thread 'item' panicked at 'attempt to add with overflow', D:\Rust\pest\pest\src\parser_state.rs:509:17
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\std\src\panicking.rs:517
   1: core::panicking::panic_fmt
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\core\src\panicking.rs:100
   2: core::panicking::panic
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\/library\core\src\panicking.rs:50
   3: pest::parser_state::ParserState<enum$<lists::Rule> >::recursive<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::expr::closure$0::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:509
   4: lists::impl$0::parse::rules::visible::expr::closure$0
             at .\tests\lists.rs:15
   5: pest::parser_state::ParserState<enum$<lists::Rule> >::rule<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::expr::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:225
   6: lists::impl$0::parse::rules::visible::expr
             at .\tests\lists.rs:15
   7: lists::impl$0::parse::rules::visible::statement::closure$0::closure$0
             at .\tests\lists.rs:15
   8: pest::parser_state::ParserState<enum$<lists::Rule> >::sequence<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::statement::closure$0::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:376
   9: lists::impl$0::parse::rules::visible::statement::closure$0
             at .\tests\lists.rs:15
  10: pest::parser_state::ParserState<enum$<lists::Rule> >::rule<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::statement::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:225
  11: lists::impl$0::parse::rules::visible::statement
             at .\tests\lists.rs:15
  12: lists::impl$0::parse::rules::visible::program::closure$0::closure$0::closure$1::closure$0::closure$0
             at .\tests\lists.rs:15
  13: pest::parser_state::ParserState<enum$<lists::Rule> >::optional<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::program::closure$0::closure$0::closure$1::closure$0::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:463
  14: lists::impl$0::parse::rules::visible::program::closure$0::closure$0::closure$1::closure$0
             at .\tests\lists.rs:15
  15: pest::parser_state::ParserState<enum$<lists::Rule> >::sequence<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::program::closure$0::closure$0::closure$1::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:376
  16: lists::impl$0::parse::rules::visible::program::closure$0::closure$0::closure$1
             at .\tests\lists.rs:15
  17: enum$<core::result::Result<alloc::boxed::Box<pest::parser_state::ParserState<enum$<lists::Rule> >,alloc::alloc::Global>,alloc::boxed::Box<pest::parser_state::ParserState<enum$<lists::Rule> >,alloc::alloc::Global> > >::and_then<alloc::boxed::Box<pest::pars
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\result.rs:965
  18: lists::impl$0::parse::rules::visible::program::closure$0::closure$0
             at .\tests\lists.rs:15
  19: pest::parser_state::ParserState<enum$<lists::Rule> >::sequence<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::program::closure$0::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:376
  20: lists::impl$0::parse::rules::visible::program::closure$0
             at .\tests\lists.rs:15
  21: pest::parser_state::ParserState<enum$<lists::Rule> >::rule<enum$<lists::Rule>,lists::impl$0::parse::rules::visible::program::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:225
  22: lists::impl$0::parse::rules::visible::program
             at .\tests\lists.rs:15
  23: lists::impl$0::parse::closure$0
             at .\tests\lists.rs:15
  24: pest::parser_state::state<enum$<lists::Rule>,lists::impl$0::parse::closure$0>
             at D:\Rust\pest\pest\src\parser_state.rs:87
  25: lists::impl$0::parse
             at .\tests\lists.rs:15
  26: lists::item
             at .\tests\lists.rs:22
  27: lists::item::closure$0
             at .\tests\lists.rs:20
  28: core::ops::function::FnOnce::call_once<lists::item::closure$0,tuple$<> >
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\ops\function.rs:227
  29: core::ops::function::FnOnce::call_once
             at /rustc/d7c97a02d1215e4ef26c31cb72dbaf16fd548b2c\library\core\src\ops\function.rs:227
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@Abdillah
Copy link
Author

Well, it seems it needs some works. Thank you for testing.

@NoahTheDuke
Copy link
Member

NoahTheDuke commented Oct 20, 2021

I wonder if using the algorithm described in this paper would help the looping and backtracking issues.

@CAD97
Copy link
Contributor

CAD97 commented Oct 20, 2021

Personally, I see PEG as a way to declaratively specify a parser, moreso than a grammar. This is because of its "first branch wins" behavior, which matches what you'd first write when writing a recursive descent parser. Any real use of pest as a parser generator is almost certainly going to want to post process the pest parse tree to construct an abstract syntax tree. (You can provide an AST view into a parse tree, but this is only realistically useful if you have a concrete syntax tree (basically: lossless AST), rather than a parse tree.)

If we can support left recursion without the cost of extra looping/backtracking, as described in NoahTheDuke's linked paper or the Pegged wiki, I'm fine with merging support for left recursive rules, and removing the requirement that rules aren't left recursive. But if it requires backtracking, I'm against, even with colored rules. (Mainly, because how does it handle hidden/mutual/indirect left recursion?)

@tomtau tomtau added this to the v3.0 milestone Nov 26, 2022
@ethindp
Copy link

ethindp commented Aug 28, 2023

I too would appreciate this kind of feature. What blocks support of this being added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants