-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kast --debug-tokens #3660
kast --debug-tokens #3660
Conversation
k-distribution/tests/regression-new/issue-3647-debugTokens/a.test.kast.out
Outdated
Show resolved
Hide resolved
* Print the list of tokens matched by the scanner, the location and the Regex Terminal | ||
*/ | ||
public String tokenizeString(String input, Source source) { | ||
StringBuilder sb = new StringBuilder("`Match` (location), Regex Terminal\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't look like this table will line up in the first column either if the token matched is longer than 10 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. If the token is longer than 10 chars, it's not going to align.
This is a best-effort display. Tokens can be hundreds of chars long and can contain new line.
Not worth trying to align everything.
kernel/src/main/java/org/kframework/parser/inner/kernel/EarleyParser.java
Outdated
Show resolved
Hide resolved
kernel/src/main/java/org/kframework/parser/inner/kernel/Scanner.java
Outdated
Show resolved
Hide resolved
Wow wonderful. I build this and tested it on |
remove some stray publics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a quick question: the size of the Match
is limited (in this beautiful table format) to 10 characters (+-), or the size of the table will be adjusted by the most extended token?
The getters are still janky and we still need to format the table better. I know you don't seem to view formatting as important, but a token longer than ten characters is far from unheard of and it will seriously impact the readability of this feature to be aligned incorrectly. The same problem will occur with extremely large files in the location field. It's not hard to just compute the longest width needed and format appropriately, and it's worth it because it will have a significant impact on readability. We can cap the total width at 80 characters and make the width of the token field smaller if needed to avoid going over that. By the way, the header of the third column is misaligned. The methods you want for the getters of the |
@dwightguth please have another look. I've also reordered the columns. I made the location the first column since it doesn't vary in size that much. I can try and engineer something perfect. But I don't think it's worth the effort. I'm also trying to keep the code simple. |
@@ -587,11 +588,11 @@ public List<Scanner.Token> getWords() { | |||
} | |||
|
|||
public List<Integer> getLines() { | |||
return Arrays.stream(lines).boxed().collect(Collectors.toList()); | |||
return Ints.asList(lines); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be wrapped in Collections.unmodifiableList because Ints.asList returns a mutable list.
We have already spent more time on this back and forth than it would have taken to just compute the max length of each column and do some basic arithmetic. This is now actually worse than it was before because it's much /less/ readable with the location field first. I agree this is not something that makes or breaks this feature, but we are going to end up using this again in the future and having seriously misaligned columns and difficult to read tables is going to have a significant impact on readability. I genuinely don't understand the resistance to something that should be at most fifteen minutes of work. As a side note, see my comment on the getters, which are still not right. |
(location) "Match" Terminal | ||
------------------------------------ | ||
(1,1,1,2), "1" r"[\\+-]?[0-9]+" | ||
(1,3,1,4), "+" "+" | ||
(1,5,1,6), "2" r"[\\+-]?[0-9]+" | ||
(1,7,1,8), "+" "+" | ||
(1,9,1,21), "aaaaaaaaaaaa" r"[a-z][a-zA-Z0-9]*" | ||
(12,1,12,2), "+" "+" | ||
(12,3,12,11), "10000000" r"[\\+-]?[0-9]+" | ||
(13,1,13,2), "+" "+" | ||
(13,3,13,8), "\"str\"" r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]" | ||
(14,1,14,2), "+" "+" | ||
(14,3,14,36), "\"long str that breaks alighnment\"" r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]" | ||
(15,1,15,1), "" "<eof>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we output this as a valid markdown table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and dynamic column widths.
Add an option to kast to show the list of matched tokens, their location and the token used to match it.
Review with hidden whitespace.
@dkcumming how does this look?