Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small Pegex language extension allowing production of ASTs without receiver class #69

Open
mohawk2 opened this issue Apr 26, 2018 · 3 comments

Comments

@mohawk2
Copy link
Collaborator

mohawk2 commented Apr 26, 2018

I propose that an extra annotation be available in Pegex grammars, so that the return result from an appropriate receiver class, possibly the default one, be a sensible Abstract Syntax Tree (AST).

This is based on "invisible XML" ideas by Steven Pemberton: https://homepages.cwi.nl/~steven/ixml/

This would operate such that this rule:

assignment: @name .(EQUAL) -(value | variablename)

would produce, on matching, this data structure:

{
  "name": "the variable name",
  "children": [
    # whatever the value and variablename productions returned
  ]
}

There are two additions above to Pegex as-is:

  • the @ means do not make a "child" node, but instead a hash-entry (like an XML attribute)
  • the - means include the annotated entity directly, rather than making an extra level of child

Details need to be figured out, such as the actual data-shape in JSON terms. A slight challenge is that this makes Pegex return effectively an XML document, which would probably only require each node be a hash with up to three keys: type, attributes, children.

  • attributes would be a hash, obviously
  • type would be a string, and would be the rulename in this situation
  • children would be an array of nodes

This need not interfere with Pegex receiver classes as currently made. The benefit of this additional feature would be that with the grammar, a simple call to Pegex with the input text would return an AST, without having to write a receiver class at all. This would make parsing completely language-independent.

This seems like it would be useful not just in parsing a language into an AST, but in any parsing situation, since I believe they can all be considered as a transformation of creating an AST, then using it. (I'm aware this doesn't deal with the choice of a tree versus a stream of parsing events, but since XML itself can be operated on as such a stream, that doesn't seem fatal)

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Apr 27, 2018

@ingydotnet points out that +(...) may help out here. Also having just reminded myself about Pegex syntax, -(...) already exists, and may already do what is under discussion here.

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Sep 23, 2018

Turns out that I was able to bodge together a receiver to use + and -: https://github.com/mohawk2/xml-invisible

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Sep 23, 2018

Note to self: in order to eliminate having to capture on all terminals, I think just overriding perl_regexes in a trivial subclass to surround the regexes with () would suffice: https://metacpan.org/source/INGY/Pegex-0.67/lib/Pegex/Compiler.pm#L140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant