Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disentangle grammar resolution and related PG code #1018

Merged
merged 10 commits into from
Jun 21, 2024

Conversation

Xanewok
Copy link
Contributor

@Xanewok Xanewok commented Jun 20, 2024

Ticks the box in #638 (Keyword trie inclusion should be reworked to not require synthetic rules over all keywords)

This exposes the resolution step rather than treating it as an implementation detail and doesn't try to shoehorn the DSL v2 items to the old Grammar model as much that the PG was based on.

Moreover, this breaks up the existing grammar visitor and tries to collect or calculate more properties upfront directly from DSL v2 in order to be more explicit that they not need to depend on the collector state that tracked everything.

I didn't submit it initially because I felt I could polish it slightly further (see the TODO note) but since I'm focused on EDR now and this cleans up the last important box in #638 (the other one is simply reverting some rules for string literals), I think it's worth reviewing in the current state.

@Xanewok Xanewok requested a review from a team as a code owner June 20, 2024 21:44
Copy link

changeset-bot bot commented Jun 20, 2024

⚠️ No Changeset found

Latest commit: 234c924

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

}

impl Resolution {
pub fn original(&self, name: &Identifier) -> &(Identifier, Item) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot that I still had it here. I need it to extract the lexical context for a given item.

I can't strictly replace it with a "get lexical context for an item" because some item may be reachable from different lexical contexts (i.e. pragma keyword) as required by our current lexer codegen/setup.

I'm happy with renaming it but are we fine with keeping this function for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I suggest renaming it to its true purpose, and add this reasoning as a commment for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 234c924

ctx.resolved.insert(ident.clone(), resolved.clone());
return resolved;
}
Item::Token { item } => item as Rc<_>,
Item::Trivia { item } => item as Rc<_>,
Item::Fragment { item } => item as Rc<_>,
_ => unreachable!("Only terminals can be resolved here"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for here and other panics: I suggest adding the provided ident to the error message, to make it easier to debug what went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2ffeb5e, I agree with the spirit, however, there's no real value IMO to do that for these unreachables specifically - it's checked above in the match whether the branch is concerned with a parser thunk (so nonterminal) or not; the code does not change often and the assertion seems fairly fundamental (what we consider a non-terminal).

For things like panic!("Empty error_recovery") it makes more sense but ideally we should validate or model that better in the DSL v2, instead.

self.scanner_contexts
.get_mut(self.current_context_name.as_ref().unwrap())
.expect("context must be set with `set_current_context`")
}

fn into_model(self) -> ParserModel {
// TODO: Separate it to a function that combines the accumulated state with the parser_fns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest attaching an issue number to these TODO comments, for future upkeeping. If it is a new issue, and it is not a high priority, we can put it in the Backlog project for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that the overhead of the issue is bigger than the fix - now, we combine it in the ParserModel::from_language and ScannerContextCollector::into_model returns only the relevant fields rather than immediately combining them with the ones in ParserFunctions.

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions. Otherwise, LGTM!
Thanks

@Xanewok
Copy link
Contributor Author

Xanewok commented Jun 21, 2024

Thanks @OmarTawfik for a swift review!

@Xanewok Xanewok added this pull request to the merge queue Jun 21, 2024
Merged via the queue into NomicFoundation:main with commit 83cfe98 Jun 21, 2024
4 checks passed
@Xanewok Xanewok deleted the use-v2-types-lexer-impl branch June 21, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants