Disentangle grammar resolution and related PG code #1018

Xanewok · 2024-06-20T21:44:43Z

Ticks the box in #638 (Keyword trie inclusion should be reworked to not require synthetic rules over all keywords)

This exposes the resolution step rather than treating it as an implementation detail and doesn't try to shoehorn the DSL v2 items to the old Grammar model as much that the PG was based on.

Moreover, this breaks up the existing grammar visitor and tries to collect or calculate more properties upfront directly from DSL v2 in order to be more explicit that they not need to depend on the collector state that tracked everything.

I didn't submit it initially because I felt I could polish it slightly further (see the TODO note) but since I'm focused on EDR now and this cleans up the last important box in #638 (the other one is simply reverting some rules for string literals), I think it's worth reviewing in the current state.

changeset-bot · 2024-06-20T21:44:50Z

⚠️ No Changeset found

Latest commit: 234c924

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Xanewok · 2024-06-20T21:52:24Z

crates/codegen/runtime/generator/src/parser/grammar/resolver.rs

+}
+
+impl Resolution {
+    pub fn original(&self, name: &Identifier) -> &(Identifier, Item) {


Ah, I forgot that I still had it here. I need it to extract the lexical context for a given item.

I can't strictly replace it with a "get lexical context for an item" because some item may be reachable from different lexical contexts (i.e. pragma keyword) as required by our current lexer codegen/setup.

I'm happy with renaming it but are we fine with keeping this function for now?

Sounds good. I suggest renaming it to its true purpose, and add this reasoning as a commment for the future.

Done in 234c924

OmarTawfik · 2024-06-21T01:29:40Z

crates/codegen/runtime/generator/src/parser/grammar/resolver.rs

                    ctx.resolved.insert(ident.clone(), resolved.clone());
                    return resolved;
                }
+                Item::Token { item } => item as Rc<_>,
+                Item::Trivia { item } => item as Rc<_>,
+                Item::Fragment { item } => item as Rc<_>,
                _ => unreachable!("Only terminals can be resolved here"),


nit: for here and other panics: I suggest adding the provided ident to the error message, to make it easier to debug what went wrong.

Done in 2ffeb5e, I agree with the spirit, however, there's no real value IMO to do that for these unreachables specifically - it's checked above in the match whether the branch is concerned with a parser thunk (so nonterminal) or not; the code does not change often and the assertion seems fairly fundamental (what we consider a non-terminal).

For things like panic!("Empty error_recovery") it makes more sense but ideally we should validate or model that better in the DSL v2, instead.

OmarTawfik · 2024-06-21T01:31:40Z

crates/codegen/runtime/generator/src/parser/mod.rs

        self.scanner_contexts
            .get_mut(self.current_context_name.as_ref().unwrap())
            .expect("context must be set with `set_current_context`")
    }

-    fn into_model(self) -> ParserModel {
+    // TODO: Separate it to a function that combines the accumulated state with the parser_fns


nit: I suggest attaching an issue number to these TODO comments, for future upkeeping. If it is a new issue, and it is not a high priority, we can put it in the Backlog project for now.

I figured that the overhead of the issue is bigger than the fix - now, we combine it in the ParserModel::from_language and ScannerContextCollector::into_model returns only the relevant fields rather than immediately combining them with the ones in ParserFunctions.

OmarTawfik

Left a few suggestions. Otherwise, LGTM!
Thanks

Xanewok · 2024-06-21T11:24:29Z

Thanks @OmarTawfik for a swift review!

Xanewok added 7 commits June 20, 2024 23:19

refactor: Simplify initialization of ResolveCtx

95e24fe

Attempt no. 2512 at simplifying grammar resolution

1154a52

Break down resolve_grammat_element a bit

be0c800

refactor: Derive all_scanners from the grammar elements directly

64b8566

refactor: Do not emit synthetic parser to account for keywords

6fc41a4

Reduce as much as possible state when walking the grammar in PG

9e6420a

Separate the collectors in the PG

bd5caef

Xanewok requested a review from a team as a code owner June 20, 2024 21:44

Xanewok mentioned this pull request Jun 20, 2024

Migrate the parser to the language definition v2 #638

Closed

14 tasks

Xanewok commented Jun 20, 2024

View reviewed changes

OmarTawfik reviewed Jun 21, 2024

View reviewed changes

OmarTawfik approved these changes Jun 21, 2024

View reviewed changes

Xanewok added 3 commits June 21, 2024 12:55

Add more details to the panicking functions

2ffeb5e

Separate combining the final PG model from the lexer model collection

a56b35f

Rename Resolution::original to Resolution::lex_ctx

234c924

Xanewok added this pull request to the merge queue Jun 21, 2024

Merged via the queue into NomicFoundation:main with commit 83cfe98 Jun 21, 2024
4 checks passed

Xanewok deleted the use-v2-types-lexer-impl branch June 21, 2024 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disentangle grammar resolution and related PG code #1018

Disentangle grammar resolution and related PG code #1018

Xanewok commented Jun 20, 2024

changeset-bot bot commented Jun 20, 2024 •

edited

Loading

Xanewok Jun 20, 2024

OmarTawfik Jun 21, 2024

Xanewok Jun 21, 2024

OmarTawfik Jun 21, 2024

Xanewok Jun 21, 2024

OmarTawfik Jun 21, 2024

Xanewok Jun 21, 2024

OmarTawfik left a comment •

edited

Loading

Xanewok commented Jun 21, 2024

Disentangle grammar resolution and related PG code #1018

Disentangle grammar resolution and related PG code #1018

Conversation

Xanewok commented Jun 20, 2024

changeset-bot bot commented Jun 20, 2024 • edited Loading

⚠️ No Changeset found

Xanewok Jun 20, 2024

Choose a reason for hiding this comment

OmarTawfik Jun 21, 2024

Choose a reason for hiding this comment

Xanewok Jun 21, 2024

Choose a reason for hiding this comment

OmarTawfik Jun 21, 2024

Choose a reason for hiding this comment

Xanewok Jun 21, 2024

Choose a reason for hiding this comment

OmarTawfik Jun 21, 2024

Choose a reason for hiding this comment

Xanewok Jun 21, 2024

Choose a reason for hiding this comment

OmarTawfik left a comment • edited Loading

Choose a reason for hiding this comment

Xanewok commented Jun 21, 2024

changeset-bot bot commented Jun 20, 2024 •

edited

Loading

OmarTawfik left a comment •

edited

Loading