Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kast --debug-tokens #3660

Merged
merged 16 commits into from
Oct 4, 2023
Merged

kast --debug-tokens #3660

merged 16 commits into from
Oct 4, 2023

Conversation

radumereuta
Copy link
Contributor

Add an option to kast to show the list of matched tokens, their location and the token used to match it.
Review with hidden whitespace.
@dkcumming how does this look?

* Print the list of tokens matched by the scanner, the location and the Regex Terminal
*/
public String tokenizeString(String input, Source source) {
StringBuilder sb = new StringBuilder("`Match` (location), Regex Terminal\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like this table will line up in the first column either if the token matched is longer than 10 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. If the token is longer than 10 chars, it's not going to align.
This is a best-effort display. Tokens can be hundreds of chars long and can contain new line.
Not worth trying to align everything.

@dkcumming
Copy link

Wow wonderful. I build this and tested it on kmir and it is already looking really helpful from the first test. This feedback of terminals is great to orient where the lexer is! I wonder if some idea of a trace of non-terminals seen on the way to the matched terminal would be possible in the future - this would provide a complete picture. However that isn't a criticism just a thought - this is great and will be very helpful :)

Copy link
Collaborator

@Robertorosmaninho Robertorosmaninho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a quick question: the size of the Match is limited (in this beautiful table format) to 10 characters (+-), or the size of the table will be adjusted by the most extended token?

@dwightguth
Copy link
Collaborator

The getters are still janky and we still need to format the table better. I know you don't seem to view formatting as important, but a token longer than ten characters is far from unheard of and it will seriously impact the readability of this feature to be aligned incorrectly. The same problem will occur with extremely large files in the location field. It's not hard to just compute the longest width needed and format appropriately, and it's worth it because it will have a significant impact on readability. We can cap the total width at 80 characters and make the width of the token field smaller if needed to avoid going over that.

By the way, the header of the third column is misaligned.

The methods you want for the getters of the int[] are Ints.asList and Collections.unmodifiableList

@radumereuta
Copy link
Contributor Author

@dwightguth please have another look.
I've fixed the getter issues. List.of(words); already uses unmodifiable lists underneath.

I've also reordered the columns. I made the location the first column since it doesn't vary in size that much.
The matched tokens and the Terminal though, can vary greatly. I don't think it's worth trying to do smarter things here.
This output is intended to be used in cmd line if small, if not dump the output in a file and navigate w/o wrap lines.

I can try and engineer something perfect. But I don't think it's worth the effort. I'm also trying to keep the code simple.
Please have a look at the updated example.

@@ -587,11 +588,11 @@ public List<Scanner.Token> getWords() {
}

public List<Integer> getLines() {
return Arrays.stream(lines).boxed().collect(Collectors.toList());
return Ints.asList(lines);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be wrapped in Collections.unmodifiableList because Ints.asList returns a mutable list.

@dwightguth
Copy link
Collaborator

We have already spent more time on this back and forth than it would have taken to just compute the max length of each column and do some basic arithmetic. This is now actually worse than it was before because it's much /less/ readable with the location field first. I agree this is not something that makes or breaks this feature, but we are going to end up using this again in the future and having seriously misaligned columns and difficult to read tables is going to have a significant impact on readability. I genuinely don't understand the resistance to something that should be at most fifteen minutes of work.

As a side note, see my comment on the getters, which are still not right.

Comment on lines 1 to 14
(location) "Match" Terminal
------------------------------------
(1,1,1,2), "1" r"[\\+-]?[0-9]+"
(1,3,1,4), "+" "+"
(1,5,1,6), "2" r"[\\+-]?[0-9]+"
(1,7,1,8), "+" "+"
(1,9,1,21), "aaaaaaaaaaaa" r"[a-z][a-zA-Z0-9]*"
(12,1,12,2), "+" "+"
(12,3,12,11), "10000000" r"[\\+-]?[0-9]+"
(13,1,13,2), "+" "+"
(13,3,13,8), "\"str\"" r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"
(14,1,14,2), "+" "+"
(14,3,14,36), "\"long str that breaks alighnment\"" r"[\\\"](([^\\\"\\n\\r\\\\])|([\\\\][nrtf\\\"\\\\])|([\\\\][x][0-9a-fA-F]{2})|([\\\\][u][0-9a-fA-F]{4})|([\\\\][U][0-9a-fA-F]{8}))*[\\\"]"
(15,1,15,1), "" "<eof>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we output this as a valid markdown table?

Copy link
Member

@ehildenb ehildenb Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radumereuta radumereuta requested a review from dwightguth October 2, 2023 15:11
@rv-jenkins rv-jenkins merged commit a49c3a2 into develop Oct 4, 2023
@rv-jenkins rv-jenkins deleted the showTokens branch October 4, 2023 21:06
@Baltoli Baltoli mentioned this pull request Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add kast option to show the list of tokens after scanner
6 participants