-
Notifications
You must be signed in to change notification settings - Fork 152
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Closes #4295 - Implements a parser for the new regex syntax as given by the grammar in #4295, then re-emits for Flex. - Fix the regression-new test suite to also strip out absolute paths to `domains.md` from test outputs All downstream semantics have already been updated to the new syntax, and if anything was missed, the new error messages should make the fix obvious. In a follow up PR, I still plan to: - Update the documentation with the grammar in #4295. - Validate that all named lexical elements actually exist in the K definition. We still don't check this, and Flex will just error if the identifier doesn't exist. - Properly handle Unicode characters in regex - Flex is an 8-bit scanner, so any Unicode codepoint is treated as its sequence of bytes, and some translation is needed to get Unicode-aware semantics - Outside of a character class, parenthesize so `r"😊*"` becomes `r"(\xF0\x9F\x98\x8A)*"` - Inside a character class, convert to an explicit `|` so `r"[😊ab]"` becomes `r"(\xF0\x9F\x98\x8A)|[ab]"` - Character ranges and negated character classes don't have a straightforward translation, so just report an error --------- Co-authored-by: Bruce Collie <[email protected]>
- Loading branch information
1 parent
320540f
commit 632e570
Showing
37 changed files
with
862 additions
and
113 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
SHELL=/bin/bash | ||
# path to the current makefile | ||
MAKEFILE_PATH := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) | ||
# path to builtin include directory | ||
BUILTIN_DIR=$(abspath $(MAKEFILE_PATH)/../../target/release/k/include/kframework/builtin) | ||
# path to binary directory of this distribution | ||
K_BIN=$(abspath $(MAKEFILE_PATH)/../../bin) | ||
# path to the kompile binary of this distribuition | ||
KOMPILE=${K_BIN}/kompile | ||
# and krun | ||
KRUN=${K_BIN}/krun | ||
# and kdep | ||
KDEP=${K_BIN}/kdep | ||
# and kprove | ||
KPROVE=${K_BIN}/kprove | ||
# and kast | ||
KAST=${K_BIN}/kast | ||
# and kparse | ||
KPARSE=${K_BIN}/kparse | ||
# and kserver | ||
KSERVER=${K_BIN}/kserver | ||
# and ksearch | ||
KSEARCH:=$(KRUN) --search-all | ||
# and kprint | ||
KPRINT=${K_BIN}/kprint | ||
# and llvm-krun | ||
LLVM_KRUN=${K_BIN}/llvm-krun | ||
# and kdep | ||
KDEP=${K_BIN}/kdep | ||
# command to strip paths from test outputs | ||
REMOVE_PATHS=| sed 's!\('`pwd`'\|'${BUILTIN_DIR}'\|/nix/store/.\+/include/kframework/builtin\)/\(\./\)\{0,2\}!!g' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
// Copyright (c) Runtime Verification, Inc. All Rights Reserved. | ||
module CHECKGROUP | ||
syntax Int ::= r"[\\+-]?[0-9]+" [prefer, token, prec(2)] | ||
syntax Int ::= r"[\\+\\-]?[0-9]+" [prefer, token, prec(2)] | ||
| Int "+" Int [group(fun,)] | ||
endmodule |
2 changes: 1 addition & 1 deletion
2
k-distribution/tests/regression-new/checks/checkMIntLiteral.k.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
[Error] Compiler: Inconsistent token precedence detected. | ||
Source(invalidPrec.k) | ||
Location(4,17,4,34) | ||
4 | syntax Foo ::= r"[0-9]+" [token] | ||
. ^~~~~~~~~~~~~~~~~ | ||
Source(domains.md) | ||
Location(1199,18,1199,52) | ||
1199 | syntax Int ::= r"[0-9]+" [prefer, token, prec(2)] | ||
. ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 5 additions & 5 deletions
10
k-distribution/tests/regression-new/issue-3647-debugTokens/a.test.kast.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,15 @@ | ||
|"Match" | (location) | Terminal | | ||
|---------------------------------------------------------------------------------|---------------|---------------------| | ||
|"1" | (1,1,1,2) | r"[\\+-]?[0-9]+" | | ||
|"1" | (1,1,1,2) | r"[0-9]+" | | ||
|"+" | (1,3,1,4) | "+" | | ||
|"2" | (1,5,1,6) | r"[\\+-]?[0-9]+" | | ||
|"2" | (1,5,1,6) | r"[0-9]+" | | ||
|"+" | (1,7,1,8) | "+" | | ||
|"aaaaaaaaaaaa" | (1,9,1,21) | r"[a-z][a-zA-Z0-9]*"| | ||
|"+" | (12,1,12,2) | "+" | | ||
|"10000000" | (12,3,12,11) | r"[\\+-]?[0-9]+" | | ||
|"10000000" | (12,3,12,11) | r"[0-9]+" | | ||
|"+" | (13,1,13,2) | "+" | | ||
|"\"str\"" | (13,3,13,8) | r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"| | ||
|"\"str\"" | (13,3,13,8) | r"[\"]([^\"\\n\\r\\\\]|([\\\\][nrtf\"\\\\]|([\\\\][x][0-9a-fA-F]{2}|([\\\\][u][0-9a-fA-F]{4}|[\\\\][U][0-9a-fA-F]{8}))))*[\"]"| | ||
|"+" | (14,1,14,2) | "+" | | ||
|"\"long str that breaks alighnment\"" | (14,3,14,103) | r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"| | ||
|"\"long str that breaks alighnment\"" | (14,3,14,103) | r"[\"]([^\"\\n\\r\\\\]|([\\\\][nrtf\"\\\\]|([\\\\][x][0-9a-fA-F]{2}|([\\\\][u][0-9a-fA-F]{4}|[\\\\][U][0-9a-fA-F]{8}))))*[\"]"| | ||
|"" | (15,1,15,1) | "<eof>" | | ||
|
4 changes: 2 additions & 2 deletions
4
k-distribution/tests/regression-new/issue-3647-debugTokens/b.test.kast.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
// Copyright (c) Runtime Verification, Inc. All Rights Reserved. | ||
module TEST | ||
syntax Int ::= r"[\\+-]?[0-9]+" [prefer, token, prec(2), badAtt(10)] | ||
syntax Int ::= r"[\\+\\-]?[0-9]+" [prefer, token, prec(2), badAtt(10)] | ||
| Int "+" Int [group(badAttButOkay),badAtt,function] | ||
endmodule |
6 changes: 3 additions & 3 deletions
6
k-distribution/tests/regression-new/pedanticAttributes/test.k.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
k-frontend/src/main/java/org/kframework/definition/regex/Regex.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
// Copyright (c) Runtime Verification, Inc. All Rights Reserved. | ||
package org.kframework.definition.regex; | ||
|
||
import java.io.Serializable; | ||
|
||
public record Regex(boolean startLine, RegexBody reg, boolean endLine) implements Serializable { | ||
public Regex(RegexBody reg) { | ||
this(false, reg, false); | ||
} | ||
} |
37 changes: 37 additions & 0 deletions
37
k-frontend/src/main/java/org/kframework/definition/regex/RegexBody.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
// Copyright (c) Runtime Verification, Inc. All Rights Reserved. | ||
package org.kframework.definition.regex; | ||
|
||
import java.io.Serializable; | ||
import java.util.List; | ||
|
||
public sealed interface RegexBody extends Serializable { | ||
record Char(int codePoint) implements RegexBody {} | ||
|
||
record AnyChar() implements RegexBody {} | ||
|
||
record Named(String name) implements RegexBody {} | ||
|
||
record CharClassExp(boolean negated, List<CharClass> charClasses) implements RegexBody {} | ||
|
||
sealed interface CharClass extends Serializable { | ||
record Char(int codePoint) implements CharClass {} | ||
|
||
record Range(CharClass.Char start, CharClass.Char end) implements CharClass {} | ||
} | ||
|
||
record Union(RegexBody left, RegexBody right) implements RegexBody {} | ||
|
||
record Concat(List<RegexBody> members) implements RegexBody {} | ||
|
||
record ZeroOrMoreTimes(RegexBody reg) implements RegexBody {} | ||
|
||
record ZeroOrOneTimes(RegexBody reg) implements RegexBody {} | ||
|
||
record OneOrMoreTimes(RegexBody reg) implements RegexBody {} | ||
|
||
record ExactlyTimes(RegexBody reg, int exactly) implements RegexBody {} | ||
|
||
record AtLeastTimes(RegexBody reg, int atLeast) implements RegexBody {} | ||
|
||
record RangeOfTimes(RegexBody reg, int atLeast, int atMost) implements RegexBody {} | ||
} |
Oops, something went wrong.