kast --debug-tokens #3660

radumereuta · 2023-09-25T14:30:43Z

Add an option to kast to show the list of matched tokens, their location and the token used to match it.
Review with hidden whitespace.
@dkcumming how does this look?

k-distribution/tests/regression-new/issue-3647-debugTokens/a.test.kast.out

dwightguth · 2023-09-25T19:06:12Z

kernel/src/main/java/org/kframework/parser/inner/ParseInModule.java

+     * Print the list of tokens matched by the scanner, the location and the Regex Terminal
+     */
+    public String tokenizeString(String input, Source source) {
+        StringBuilder sb = new StringBuilder("`Match`    (location), Regex Terminal\n");


it doesn't look like this table will line up in the first column either if the token matched is longer than 10 characters.

True. If the token is longer than 10 chars, it's not going to align.
This is a best-effort display. Tokens can be hundreds of chars long and can contain new line.
Not worth trying to align everything.

kernel/src/main/java/org/kframework/parser/inner/kernel/EarleyParser.java

kernel/src/main/java/org/kframework/parser/inner/kernel/Scanner.java

dkcumming · 2023-09-26T00:59:47Z

Wow wonderful. I build this and tested it on kmir and it is already looking really helpful from the first test. This feedback of terminals is great to orient where the lexer is! I wonder if some idea of a trace of non-terminals seen on the way to the matched terminal would be possible in the future - this would provide a complete picture. However that isn't a criticism just a thought - this is great and will be very helpful :)

remove some stray publics

Robertorosmaninho

LGTM! Just a quick question: the size of the Match is limited (in this beautiful table format) to 10 characters (+-), or the size of the table will be adjusted by the most extended token?

dwightguth · 2023-09-26T18:43:25Z

The getters are still janky and we still need to format the table better. I know you don't seem to view formatting as important, but a token longer than ten characters is far from unheard of and it will seriously impact the readability of this feature to be aligned incorrectly. The same problem will occur with extremely large files in the location field. It's not hard to just compute the longest width needed and format appropriately, and it's worth it because it will have a significant impact on readability. We can cap the total width at 80 characters and make the width of the token field smaller if needed to avoid going over that.

By the way, the header of the third column is misaligned.

The methods you want for the getters of the int[] are Ints.asList and Collections.unmodifiableList

radumereuta · 2023-09-27T15:19:11Z

@dwightguth please have another look.
I've fixed the getter issues. List.of(words); already uses unmodifiable lists underneath.

I've also reordered the columns. I made the location the first column since it doesn't vary in size that much.
The matched tokens and the Terminal though, can vary greatly. I don't think it's worth trying to do smarter things here.
This output is intended to be used in cmd line if small, if not dump the output in a file and navigate w/o wrap lines.

I can try and engineer something perfect. But I don't think it's worth the effort. I'm also trying to keep the code simple.
Please have a look at the updated example.

dwightguth · 2023-09-27T20:06:38Z

kernel/src/main/java/org/kframework/parser/inner/kernel/EarleyParser.java

@@ -587,11 +588,11 @@ public List<Scanner.Token> getWords() {
    }

    public List<Integer> getLines() {
-      return Arrays.stream(lines).boxed().collect(Collectors.toList());
+      return Ints.asList(lines);


this needs to be wrapped in Collections.unmodifiableList because Ints.asList returns a mutable list.

dwightguth · 2023-09-27T20:06:58Z

We have already spent more time on this back and forth than it would have taken to just compute the max length of each column and do some basic arithmetic. This is now actually worse than it was before because it's much /less/ readable with the location field first. I agree this is not something that makes or breaks this feature, but we are going to end up using this again in the future and having seriously misaligned columns and difficult to read tables is going to have a significant impact on readability. I genuinely don't understand the resistance to something that should be at most fifteen minutes of work.

As a side note, see my comment on the getters, which are still not right.

ehildenb · 2023-09-28T15:03:59Z

k-distribution/tests/regression-new/issue-3647-debugTokens/a.test.kast.out

+(location)    "Match"      Terminal
+------------------------------------
+(1,1,1,2),    "1"          r"[\\+-]?[0-9]+"
+(1,3,1,4),    "+"          "+"
+(1,5,1,6),    "2"          r"[\\+-]?[0-9]+"
+(1,7,1,8),    "+"          "+"
+(1,9,1,21),   "aaaaaaaaaaaa" r"[a-z][a-zA-Z0-9]*"
+(12,1,12,2),  "+"          "+"
+(12,3,12,11), "10000000"   r"[\\+-]?[0-9]+"
+(13,1,13,2),  "+"          "+"
+(13,3,13,8),  "\"str\""    r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"
+(14,1,14,2),  "+"          "+"
+(14,3,14,36), "\"long str that breaks alighnment\"" r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"
+(15,1,15,1),  ""           "<eof>"


Can we output this as a valid markdown table?

https://www.markdownguide.org/extended-syntax/

and dynamic column widths.

radumereuta added 3 commits September 25, 2023 16:42

Show scanner tokens

2d47e48

Add option to kast

2552184

Add test

7a9b1ac

radumereuta requested review from dwightguth and Robertorosmaninho September 25, 2023 14:42

radumereuta linked an issue Sep 25, 2023 that may be closed by this pull request

Add kast option to show the list of tokens after scanner #3647

Closed

dwightguth requested changes Sep 25, 2023

View reviewed changes

radumereuta added 3 commits September 26, 2023 17:45

Code review

fd5d0b1

More unmodifiable collections

81a8dff

Update test output

c473a29

radumereuta requested a review from dwightguth September 26, 2023 15:15

radumereuta and others added 4 commits September 26, 2023 19:18

Refactor code

c1d257d

remove some stray publics

Remove imports

8b2b24e

Update Nix lock files

31ceb50

Merge branch 'develop' into showTokens

d2e62d7

Robertorosmaninho approved these changes Sep 26, 2023

View reviewed changes

Fix output

1601fad

dwightguth reviewed Sep 27, 2023

View reviewed changes

ehildenb reviewed Sep 28, 2023

View reviewed changes

radumereuta added 4 commits October 2, 2023 16:33

Merge branch 'develop' into showTokens

a7b84d8

Unmodifiable list

b87588e

Output Markdown table

c275067

and dynamic column widths.

Update cmd option description

237eb52

radumereuta requested a review from dwightguth October 2, 2023 15:11

dwightguth approved these changes Oct 4, 2023

View reviewed changes

radumereuta added the automerge label Oct 4, 2023

Merge branch 'develop' into showTokens

2b07cd1

rv-jenkins merged commit a49c3a2 into develop Oct 4, 2023

rv-jenkins deleted the showTokens branch October 4, 2023 21:06

radumereuta mentioned this pull request Oct 23, 2023

K version bump feature list (6.1.0) #3706

Closed

5 tasks

Baltoli mentioned this pull request Dec 12, 2023

2023 Goals #3098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kast --debug-tokens #3660

kast --debug-tokens #3660

radumereuta commented Sep 25, 2023

dwightguth Sep 25, 2023

radumereuta Sep 26, 2023

dkcumming commented Sep 26, 2023

Robertorosmaninho left a comment

dwightguth commented Sep 26, 2023

radumereuta commented Sep 27, 2023

dwightguth Sep 27, 2023

dwightguth commented Sep 27, 2023

ehildenb Sep 28, 2023

ehildenb Sep 28, 2023 •

edited

Loading

kast --debug-tokens #3660

kast --debug-tokens #3660

Conversation

radumereuta commented Sep 25, 2023

dwightguth Sep 25, 2023

Choose a reason for hiding this comment

radumereuta Sep 26, 2023

Choose a reason for hiding this comment

dkcumming commented Sep 26, 2023

Robertorosmaninho left a comment

Choose a reason for hiding this comment

dwightguth commented Sep 26, 2023

radumereuta commented Sep 27, 2023

dwightguth Sep 27, 2023

Choose a reason for hiding this comment

dwightguth commented Sep 27, 2023

ehildenb Sep 28, 2023

Choose a reason for hiding this comment

ehildenb Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

ehildenb Sep 28, 2023 •

edited

Loading