JPlag is unable to handle 'ć' character #1612

uuqjz · 2024-02-26T07:57:38Z

When trying to check a Spanish cpp dataset almost all submissions are discarded for containing the char 'ć'.
I am using 5.1.0 from the dev branch.
The dataset is linked below:
z5z5.zip

TwoOfTwelve · 2024-02-26T08:19:15Z

This error is not caused by JPlag directly. This is the same issue as #1427. There is already an ANTLR issue related to this: antlr/grammars-v4#3952.

It seems that the character only occurs in comments, so as a workaround you could write a script that deletes all comments and run JPlag after it.

tsaglam · 2024-02-26T08:49:15Z

If you want to keep comments, another workaround might be to remove all non-ASCII characters from the comments via a script.

uuqjz · 2024-02-26T08:54:53Z

After removing every non ascii char, I get this error:
line 1:3 token recognition error at: ''
The line in question is
/�B�2017/2018: Zadaa 5, Zadatak 4
It seems like these are start of heading control codes which have to be removed too

uuqjz · 2024-02-26T09:13:13Z

Even after this preprocessing those files cannot be parsed:
failed to parse 'student9307.cpp'Cannot invoke "de.jplag.cpp.grammar.CPP14Parser$DeclaratorContext.pointerDeclarator()" because the return value of "de.jplag.cpp.grammar.CPP14Parser$ParameterDeclarationContext.declarator()" is null

TwoOfTwelve · 2024-02-26T10:54:50Z

After removing every non ascii char, I get this error: line 1:3 token recognition error at: '' The line in question is /�B�2017/2018: Zadaa 5, Zadatak 4 It seems like these are start of heading control codes which have to be removed too

This is probably caused by an encoding issue. I think your files might be encoded in UTF-16 or something, but there is very little for the heuristic to actually go by. We might want to include an encoding flag in the future.

I will look into the second error later, but it also seems to be caused by ANTLR.

TwoOfTwelve · 2024-02-26T12:40:05Z

I look a little more into the first line of the files and I don't know how that ever came to be. It certainly does not look like valid cpp code.

The null pointer issue should be fixed in #1613

uuqjz · 2024-02-26T12:41:50Z

Got the dataset from here:
https://ieee-dataport.org/open-access/programming-homework-dataset-plagiarism-detection

TwoOfTwelve · 2024-02-26T12:51:51Z

With the fix and the first line removed JPlag runs on my machine.

uuqjz · 2024-02-26T12:52:49Z

Awesome, the fix works

uuqjz added enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change labels Feb 26, 2024

tsaglam added the duplicate This has been discussed somewhere else label Feb 26, 2024

TwoOfTwelve mentioned this issue Feb 26, 2024

Fixed a bug, where the cpp listener could throw a null pointer exception #1613

Merged

uuqjz linked a pull request Feb 26, 2024 that will close this issue

Fixed a bug, where the cpp listener could throw a null pointer exception #1613

Merged

tsaglam closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JPlag is unable to handle 'ć' character #1612

JPlag is unable to handle 'ć' character #1612

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024

tsaglam commented Feb 26, 2024

uuqjz commented Feb 26, 2024 •

edited

Loading

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024 •

edited

Loading

TwoOfTwelve commented Feb 26, 2024

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024

uuqjz commented Feb 26, 2024 •

edited

Loading

JPlag is unable to handle 'ć' character #1612

JPlag is unable to handle 'ć' character #1612

Comments

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024

tsaglam commented Feb 26, 2024

uuqjz commented Feb 26, 2024 • edited Loading

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024 • edited Loading

TwoOfTwelve commented Feb 26, 2024

uuqjz commented Feb 26, 2024

TwoOfTwelve commented Feb 26, 2024

uuqjz commented Feb 26, 2024 • edited Loading

uuqjz commented Feb 26, 2024 •

edited

Loading

TwoOfTwelve commented Feb 26, 2024 •

edited

Loading

uuqjz commented Feb 26, 2024 •

edited

Loading