Skip to content

Commit

Permalink
grammar/case_rule: extend the testcase to check non-ASCII tokens
Browse files Browse the repository at this point in the history
The logic of case/match lexing rules may be complex when working on
source buffers encoded using varying length charsets such as UTF-8.
Extend this testcase so that the "backwards codepoint lookup" behavior
is exercised with a multi-bytes codepoint.
  • Loading branch information
pmderodat committed Mar 13, 2024
1 parent dbad162 commit 4048175
Showing 3 changed files with 11 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@ lexer foo_lexer {

char
dot <- "."
id <- p"[a-zA-Z]+"
id <- p"[a-zA-Zé🙂]+"
tick <- "'"
newline <- p"\n"

1 change: 1 addition & 0 deletions testsuite/tests/grammar/case_rule/main.py
Original file line number Diff line number Diff line change
@@ -10,6 +10,7 @@
('simple-attr', "a'b"),
('char-dot', "'a'.b"),
('id-char', "a'b'"),
('unicode-id-char', "\xe9'\U0001f642'"),
):
print('== {} =='.format(label))
u = ctx.get_from_buffer('{}.txt'.format(label), text)
9 changes: 9 additions & 0 deletions testsuite/tests/grammar/case_rule/test.out
Original file line number Diff line number Diff line change
@@ -24,5 +24,14 @@ main.py: Running...
<Token Tick "'" at 1:4-1:5>
<Token Termination at 1:5-1:5>

== unicode-id-char ==
1:5-1:5: Expected Id, got Termination
--
<Token Id 'é' at 1:1-1:2>
<Token Tick "'" at 1:2-1:3>
<Token Id '🙂' at 1:3-1:4>
<Token Tick "'" at 1:4-1:5>
<Token Termination at 1:5-1:5>

main.py: Done.
Done

0 comments on commit 4048175

Please sign in to comment.