Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse regular expressions #4334

Merged
merged 4 commits into from
May 15, 2024
Merged

Parse regular expressions #4334

merged 4 commits into from
May 15, 2024

Conversation

Scott-Guest
Copy link
Contributor

@Scott-Guest Scott-Guest commented May 9, 2024

Closes #4295

  • Implements a parser for the new regex syntax as given by the grammar in Specify and parse permitted regex syntax #4295, then re-emits for Flex.
  • Fix the regression-new test suite to also strip out absolute paths to domains.md from test outputs

All downstream semantics have already been updated to the new syntax, and if anything was missed, the new error messages should make the fix obvious.

In a follow up PR, I still plan to:

  • Update the documentation with the grammar in Specify and parse permitted regex syntax #4295.
  • Validate that all named lexical elements actually exist in the K definition. We still don't check this, and Flex will just error if the identifier doesn't exist.
  • Properly handle Unicode characters in regex
    • Flex is an 8-bit scanner, so any Unicode codepoint is treated as its sequence of bytes, and some translation is needed to get Unicode-aware semantics
    • Outside of a character class, parenthesize so r"😊*" becomes r"(\xF0\x9F\x98\x8A)*"
    • Inside a character class, convert to an explicit | so r"[😊ab]" becomes r"(\xF0\x9F\x98\x8A)|[ab]"
    • Character ranges and negated character classes don't have a straightforward translation, so just report an error

@Scott-Guest Scott-Guest self-assigned this May 9, 2024
@Scott-Guest Scott-Guest marked this pull request as ready for review May 14, 2024 19:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who needs Scala? 😛

Copy link
Contributor

@Baltoli Baltoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design here is really nice; great work!

@rv-jenkins rv-jenkins merged commit 632e570 into develop May 15, 2024
17 checks passed
@rv-jenkins rv-jenkins deleted the parse-regex branch May 15, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants