Small Pegex language extension allowing production of ASTs without receiver class #69

mohawk2 · 2018-04-26T23:44:53Z

I propose that an extra annotation be available in Pegex grammars, so that the return result from an appropriate receiver class, possibly the default one, be a sensible Abstract Syntax Tree (AST).

This is based on "invisible XML" ideas by Steven Pemberton: https://homepages.cwi.nl/~steven/ixml/

This would operate such that this rule:

assignment: @name .(EQUAL) -(value | variablename)

would produce, on matching, this data structure:

{
  "name": "the variable name",
  "children": [
    # whatever the value and variablename productions returned
  ]
}

There are two additions above to Pegex as-is:

the @ means do not make a "child" node, but instead a hash-entry (like an XML attribute)
the - means include the annotated entity directly, rather than making an extra level of child

Details need to be figured out, such as the actual data-shape in JSON terms. A slight challenge is that this makes Pegex return effectively an XML document, which would probably only require each node be a hash with up to three keys: type, attributes, children.

attributes would be a hash, obviously
type would be a string, and would be the rulename in this situation
children would be an array of nodes

This need not interfere with Pegex receiver classes as currently made. The benefit of this additional feature would be that with the grammar, a simple call to Pegex with the input text would return an AST, without having to write a receiver class at all. This would make parsing completely language-independent.

This seems like it would be useful not just in parsing a language into an AST, but in any parsing situation, since I believe they can all be considered as a transformation of creating an AST, then using it. (I'm aware this doesn't deal with the choice of a tree versus a stream of parsing events, but since XML itself can be operated on as such a stream, that doesn't seem fatal)

The text was updated successfully, but these errors were encountered:

mohawk2 · 2018-04-27T02:49:41Z

@ingydotnet points out that +(...) may help out here. Also having just reminded myself about Pegex syntax, -(...) already exists, and may already do what is under discussion here.

mohawk2 · 2018-09-23T05:15:43Z

Turns out that I was able to bodge together a receiver to use + and -: https://github.com/mohawk2/xml-invisible

mohawk2 · 2018-09-23T05:20:52Z

Note to self: in order to eliminate having to capture on all terminals, I think just overriding perl_regexes in a trivial subclass to surround the regexes with () would suffice: https://metacpan.org/source/INGY/Pegex-0.67/lib/Pegex/Compiler.pm#L140

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small Pegex language extension allowing production of ASTs without receiver class #69

Small Pegex language extension allowing production of ASTs without receiver class #69

mohawk2 commented Apr 26, 2018 •

edited

Loading

mohawk2 commented Apr 27, 2018

mohawk2 commented Sep 23, 2018

mohawk2 commented Sep 23, 2018

Small Pegex language extension allowing production of ASTs without receiver class #69

Small Pegex language extension allowing production of ASTs without receiver class #69

Comments

mohawk2 commented Apr 26, 2018 • edited Loading

mohawk2 commented Apr 27, 2018

mohawk2 commented Sep 23, 2018

mohawk2 commented Sep 23, 2018

mohawk2 commented Apr 26, 2018 •

edited

Loading