Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight changes to the report json format #1189

Merged
merged 21 commits into from
Aug 18, 2023
Merged

Conversation

Kr0nox
Copy link
Member

@Kr0nox Kr0nox commented Jul 13, 2023

This PR applies some slight changes to the format of the overview.json file amd comparison files to make it easier to extend with new Metrics (#1134) and make it fit the format used inside the report viewer.

Changes to the overview.json

The metric attribute is removed.
A new attribute distributions is added. This is a map of each metric as a key (3 Letter abbreviation in upper case e.g. "AVG") and as a key it as a List of numbers as the distribution. This List is identical to the one previously saved in each metric.
A new attribute top_comparisons is added. This is a list of objects. Each object saves the ids of the two submissions. It also saves a map with each metric as a key and the similarity for this metric as a value.

Here is an exampl of how the new overview.json looks compared to the old one. All unchanged attributes values are replaced by ...:
Old:

{
    "jplag_version": ...,
    "submission_folder_path": ...,
    "base_code_folder_path": ...,
    "language": ...,
    "file_extensions": ...,
    "submission_id_to_display_name": ...,
    "submission_ids_to_comparison_file_name": ...,
    "failed_submission_names": ...,
    "excluded_files": ..,
    "match_sensitivity": ...,
    "date_of_execution": ...,
    "execution_time": ...,
    "metrics": [
        {
            "name": "AVG",
            "distribution": [1, 0, 2, 0, 0, 0, 0, 3, 0, 0],
            "topComparisons": [
                {
                    "first_submission": "A",
                    "second_submission": "C",
                    "similarity": 0.9959390862944163
                },
                {
                    "first_submission": "B",
                    "second_submission": "A",
                    "similarity": 0.23864648263579696
                },
                {
                    "first_submission": "B",
                    "second_submission": "C",
                    "similarity": 0.23864648263579696
                }
            ],
            "description": "Average of both program coverages. This is the default similarity which works in most cases: Matches with a high average similarity indicate that the programs work in a very similar way."
        },
        {
            "name": "MAX",
            "distribution": [5, 1, 0, 0, 0, 0, 0, 0, 0, 0
            ],
            "topComparisons": [
                {
                    "first_submission": "A",
                    "second_submission": "C",
                    "similarity": 0.9959390862944163
                },
                {
                    "first_submission": "B",
                    "second_submission": "A",
                    "similarity": 0.9710144927536232
                },
                {
                    "first_submission": "B",
                    "second_submission": "C",
                    "similarity": 0.9710144927536232
                }
            ],
            "description": "Maximum of both program coverages. This ranking is especially useful if the programs are very different in size. This can happen when dead code was inserted to disguise the origin of the plagiarized program."
        }
    ],
    "clusters": ...,
    "total_comparisons": ...
}

New:

{
    "jplag_version": ...,
    "submission_folder_path": ...,
    "base_code_folder_path": ...,
    "language": ...,
    "file_extensions": ...,
    "submission_id_to_display_name": ...,
    "submission_ids_to_comparison_file_name": ...,
    "failed_submission_names": ...,
    "excluded_files": ..,
    "match_sensitivity": ...,
    "date_of_execution": ...,
    "execution_time": ...,
    "distributions": {
        "AVG": [1, 0, 2, 0, 0, 0, 0, 3, 0, 0],
        "MAX": [5, 1, 0, 0, 0, 0, 0, 0, 0, 0]
    },
    "top_comparisons": [
        {
          "first_submission": "A",
          "second_submission": "C",
          "similarities": { "AVG": 0.9960435212660732, "MAX": 0.9960435212660732 }
        },
        {
          "first_submission": "B",
          "second_submission": "A",
          "similarities": { "AVG": 0.2378472222222222, "MAX": 0.9716312056737588 }
        },
        {
          "first_submission": "B",
          "second_submission": "C",
          "similarities": { "AVG": 0.2378472222222222, "MAX": 0.9716312056737588 }
        }
      ],
    "clusters": ...,
    "total_comparisons": ...
}

Changes to the comparison files

similarity gets replaced by similarities and is now a map of similarities.

Old:

{
  "id1": ...
  "id2": ...
  "similarity": 0.3291991495393338,
  "matches":  ...
}

New:

{
  "id1": ...
  "id2": ...
  "similarities": { "AVG": 0.9960435212660732, "MAX": 0.9960435212660732 },
  "matches":  ...
}

@Kr0nox
Copy link
Member Author

Kr0nox commented Jul 13, 2023

One thing I would also like to put to discussion is modifying the distributions to contain a value for each percentile and not for cummulative 10 percentiles. This would enable us the display more detailed distribution graphs

@Kr0nox Kr0nox changed the title Feature/rework json Slight changes to the report json format Jul 13, 2023
@tsaglam
Copy link
Member

tsaglam commented Jul 14, 2023

One thing I would also like to put to discussion is modifying the distributions to contain a value for each percentile and not for cummulative 10 percentiles. This would enable us the display more detailed distribution graphs

I would say we could play around with some more fine-grained options for the distribution (1 or 5, for example). In the UI we should always offer an option to toggle back to the old 10-based distribution.

@tsaglam
Copy link
Member

tsaglam commented Jul 14, 2023

@Alberth289346 just FYI, this PR will be part of a major release sometimes in the next quarter, where the overhauled report viewer will be released. This might break things for you.

@sebinside
Copy link
Member

How much effort would it be to encapsulate the deserialization in the report viewer and also to support legacy reports, i.e., reports generated with the current version prior to this PR? If it is reasonable I would vote for supporting this as the current JSON format was at least somewhat stable throughout the past.

@Kr0nox
Copy link
Member Author

Kr0nox commented Jul 14, 2023

How much effort would it be to encapsulate the deserialization in the report viewer and also to support legacy reports

Not much work. Would be as easy as testing if the json has an attribute called metric and then extracting the values the old way and if it does not exists we would use the new way.
So if that is wanted I an implement that.

But there might be some issues with switching to 100 values in the distribution since we can not extract that amount of detail from the old reports. So any visualization with this detail will not be possible and it will have to be checked whether the old or new format was used

@sebinside
Copy link
Member

Also looking at the fact that the current version of the Report Viewer will be the legacy one deployed online (maybe for a long time), I would opt for that. However, only if this does not significantly degrade the code quality and structure of the viewer. Any thoughts on this @tsaglam?

@tsaglam
Copy link
Member

tsaglam commented Jul 14, 2023

Also looking at the fact that the current version of the Report Viewer will be the legacy one deployed online (maybe for a long time), I would opt for that.

Good point, I agree.

For the distribution we could proceed similarly, if it only has ten values, you only see the old histogram.
If you have more values, you see the new one and can toggle between old and new.

@tsaglam tsaglam added enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change report-viewer PR / Issue deals (partly) with the report viewer and thus involves web-dev technologies labels Jul 14, 2023
@Alberth289346
Copy link
Contributor

@Alberth289346 just FYI, this PR will be part of a major release sometimes in the next quarter, where the overhauled report viewer will be released. This might break things for you.

Thanks for the head-up.

I had similar thoughts about this issue as sebinside, why is data moved ? It breaks compatibility without getting much in return (data isn't becoming more useful by moving it as far as I can see). I pondered about it, and wrote #1190 to move the discussion away from here.

@Kr0nox Kr0nox marked this pull request as ready for review July 28, 2023 10:41
Copy link
Member

@tsaglam tsaglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Java side, this looks good.

@Kr0nox Kr0nox requested a review from TwoOfTwelve August 14, 2023 07:53
@sonarqubecloud
Copy link

[JPlag Report Viewer] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@sonarqubecloud
Copy link

[JPlag Plagiarism Detector] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@sebinside sebinside merged commit 5c39647 into develop Aug 18, 2023
@sebinside sebinside deleted the feature/rework-json branch August 18, 2023 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change report-viewer PR / Issue deals (partly) with the report viewer and thus involves web-dev technologies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants