Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define end-of-file trivia as part of the language definition #1020

Open
Xanewok opened this issue Jun 21, 2024 · 2 comments
Open

Define end-of-file trivia as part of the language definition #1020

Xanewok opened this issue Jun 21, 2024 · 2 comments

Comments

@Xanewok
Copy link
Contributor

Xanewok commented Jun 21, 2024

I remember that we did discuss what to do with trailing trivia and that it's not so obvious, as for example

// ...snip
}

/* some multi
line comment */

would mean that this conceptually would belong to the } token and this prompts the question of how to format it and adds inconsistency as our trailing trivia in general cannot span multiple lines.

The way we have this setup is that the trailing trivia has to be a conceptually a subset of the leading trivia and thus the leading trivia is the trivia parser, which means if we want to accept all of the trivia items at EOF, then the leading trivia definition is correct for that use case and not a stretch, from what I understand (but, to be clear, we do have a "hack" of implicitly parsing the leading trivia rather than it having a dedicated definition in the definition.rs).

If we want to use the DSLv2's definition of (as added by this PR)

        ZeroOrMore(Choice([
            Trivia(Whitespace),
            Trivia(SingleLineComment),
            Trivia(MultilineComment)
        ])),
        Choice([Trivia(EndOfLine), EndOfInput])

then for it to be correct, we'd have to change it to something like:

    trailing_trivia = Choice([
        // A single line comment
        Sequence([
            Optional(Trivia(Whitespace)),
            Optional(Choice([
                Trivia(SingleLineComment),
                Trivia(MultilineCommentThatDoesNotContainAnEndOfLine)
            ])),
            Trivia(EndOfLine),
        ])
        // De facto end-of-file trivia
        // (separated in order to disallow multi-line trivias as trailing in the general case)
        Sequence([
            ZeroOrMore(Choice([
                /* trivias listed in leading_trivia */
            ])),
            Trivia(EndOfFile),
        ])
    ]),

At which point we're de facto reintroducing end_of_file_trivia, now that I think about it. Given that it's different enough from the regular trailing trivia, should we just bite the bullet and be more explicit about it to not confuse the user and be clear about expectations?

Introducing a dedicated notion of end_of_file trivia would also mean we don't need to introduce this virtual EndOfFile trivia/token, which we all agreed at some point to not have magical/synthesized token in the definition that can't be directly parsed.


@AntonyBlakey @OmarTawfik I don't think we did write down our different conclusions and rationale while we were discussing this on multiple occasions. If you're fine with it, I'd propose that we separate an issue/thread dedicated to it as I feel we had already too many items bundled in this issue and the decision is not clear-cut at this moment.

Originally posted by @Xanewok in #638 (comment)

@Xanewok
Copy link
Contributor Author

Xanewok commented Jun 21, 2024

I understand that the underlying rationale for not implicitly parsing the leading trivia in the runtime parser ("rolling back the hack") is to keep it explicit as part of the definition and so that's why I named this issue this way; feel free to rename it.

@OmarTawfik
Copy link
Contributor

Adding your comment from #1021 (comment)

Anything that will touch the trivia as part of the #1020 will probably have to adapt this but doesn't strictly have to, so there's no point in keeping this TODO as it's optional and tracked elsewhere.

Moreover, we need to keep using OneOrMore for the current valid definition of leading_trivia and ZeroOrMore is unused but might be as part of #1020; I'm happy to remove it as part of this PR per YAGNI.

I think the current trailing_trivia behavior (without multiline comments) provides a reasonable structure for users. I'm not proposing doing any behavioral changes to the parser. I'm only wondering if we should go back to use ZeroOrMore and Optional, given that we no longer create trivia nonterminals. Now both parsers can produce nothing. This also means we can just keep reusing leading_trivia for end-of-file trivia, as it matches our current design.

    leading_trivia = ZeroOrMore(Choice([
        Trivia(Whitespace),
        Trivia(EndOfLine),
        Trivia(SingleLineComment),
        Trivia(MultiLineComment),
        Trivia(SingleLineNatSpecComment),
        Trivia(MultiLineNatSpecComment)
    ])),
    trailing_trivia = Optional(Sequence([
        Optional(Trivia(Whitespace)),
        Optional(Trivia(SingleLineComment)),
        Trivia(EndOfLine)
    ])),

Definitely not high-pri, but it is something we discussed in a few instances already, and it would be good to have a consensus written down here.

github-merge-queue bot pushed a commit that referenced this issue Jun 22, 2024
Closes #638 

The trivia item from #638 is now separately tracked in #1020.
@OmarTawfik OmarTawfik moved this to Todo in Slang - Backlog Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants