-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JPlag is unable to handle 'ć' character #1612
Comments
This error is not caused by JPlag directly. This is the same issue as #1427. There is already an ANTLR issue related to this: antlr/grammars-v4#3952. It seems that the character only occurs in comments, so as a workaround you could write a script that deletes all comments and run JPlag after it. |
If you want to keep comments, another workaround might be to remove all non-ASCII characters from the comments via a script. |
After removing every non ascii char, I get this error: |
Even after this preprocessing those files cannot be parsed: |
This is probably caused by an encoding issue. I think your files might be encoded in UTF-16 or something, but there is very little for the heuristic to actually go by. We might want to include an encoding flag in the future. I will look into the second error later, but it also seems to be caused by ANTLR. |
I look a little more into the first line of the files and I don't know how that ever came to be. It certainly does not look like valid cpp code. The null pointer issue should be fixed in #1613 |
Got the dataset from here: |
With the fix and the first line removed JPlag runs on my machine. |
Awesome, the fix works |
When trying to check a Spanish cpp dataset almost all submissions are discarded for containing the char 'ć'.
I am using 5.1.0 from the dev branch.
The dataset is linked below:
z5z5.zip
The text was updated successfully, but these errors were encountered: