JPlag is a system that finds similarities among multiple sets of source code files. This way it can detect software plagiarism and collusion in software development. JPlag currently supports various programming languages, EMF metamodels, and natural language text.
In the following, a list of all supported languages with their supported language version is provided. A language can be selected from the command line using subcommands (jplag [jplag options] [language options]). Alternatively you can use the legacy "-l" argument.
Language | Version | CLI Argument Name | state | parser |
---|---|---|---|---|
Java | 17 | java | mature | JavaC |
C/C++ | 11 | cpp | legacy | JavaCC |
C/C++ | 14 | cpp2 | beta | ANTLR 4 |
C# | 6 | csharp | beta | ANTLR 4 |
Go | 1.17 | golang | beta | ANTLR 4 |
Kotlin | 1.3 | kotlin | beta | ANTLR 4 |
Python | 3.6 | python3 | legacy | ANTLR 4 |
R | 3.5.0 | rlang | beta | ANTLR 4 |
Rust | 1.60.0 | rust | beta | ANTLR 4 |
Scala | 2.13.8 | scala | beta | Scalameta |
Scheme | ? | scheme | unknown | JavaCC |
Swift | 5.4 | swift | beta | ANTLR 4 |
EMF Metamodel | 2.25.0 | emf | beta | EMF |
EMF Model | 2.25.0 | emf-model | alpha | EMF |
Text (naive) | - | text | legacy | CoreNLP |
You need Java SE 17 to run or build JPlag.
- Download a released version.
- In case you depend on the legacy version of JPlag we refer to the legacy release v2.12.1 and the legacy branch.
JPlag is released on Maven Central, it can be included as follows:
<dependency>
<groupId>de.jplag</groupId>
<artifactId>jplag</artifactId>
</dependency>
- Download or clone the code from this repository.
- Run
mvn clean package
from the root of the repository to compile and build all submodules. Runmvn clean package assembly:single
instead if you need the full jar which includes all dependencies. - You will find the generated JARs in the subdirectory
cli/target
.
JPlag can either be used via the CLI or directly via its Java API. For more information, see the usage information in the wiki. If you are using the CLI, you can display your results via jplag.github.io. No data will leave your computer!
Note that the legacy CLI is varying slightly.
The language can either be set with the -l parameter or as a subcommand. If both a subcommand and the -l option are specified, the subcommand will take priority. When using the subcommand language specific arguments can be set. A list of language specific options can be obtained by requesting the help page of a subcommand (e.g. "jplag java -h").
Usage: jplag [OPTIONS] [root-dirs[,root-dirs...]...] [COMMAND]
[root-dirs[,root-dirs...]...]
Root-directory with submissions to check for plagiarism
-bc, --bc, --base-code=<baseCode>
Path of the directory containing the base code
(common framework used in all submissions)
-h, --help display this help and exit
-l, --language=<language>
Select the language to parse the submissions (default:
java). The language names are the same as the
subcommands.
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown
in the generated report, if set to -1 all comparisons
will be shown (default: 100)
-new, --new=<newDirectories>[,<newDirectories>...]
Root-directory with submissions to check for plagiarism
(same as the root directory)
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directory with prior submissions to compare against
-r, --result-directory=<resultFolder>
Name of the directory in which the comparison results
will be stored (default: result)
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the
minimum token required to be counted as a matching
section. A smaller <n> increases the sensitivity but
might lead to more false-positives
Advanced
-d, --debug Debug parser. Non-parsable files will be stored
(default: false)
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All
comparisons above this threshold will be saved
(default: 0.0)
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are
included
-s, --subdirectory=<subdirectory>
Look in directories <root-dir>/*/<dir> for programs
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the
comparison (line-separated list)
Clustering
--cluster-alg, --cluster-algorithm=<algorithm>
Which clustering algorithm to use. Agglomerative merges
similar submissions bottom up. Spectral clustering is
combined with Bayesian Optimization to execute
the k-Means clustering algorithm multiple times,
hopefully finding a "good" clustering
automatically. (default: spectral)
--cluster-metric=<metric>
The metric used for clustering. AVG is intersection
over union, MAX can expose some attempts of
obfuscation. (default: MAX)
--cluster-skip Skips the clustering (default: false)
Commands:
cpp
cpp2
csharp
emf
emf-model
go
java
kotlin
python3
rlang
rust
scala
scheme
scxml
swift
text
The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:
Language language = new de.jplag.java.Language();
language.getOptions(); //Use the object returned by this to set language options(same as language specific arguments above).
Set<File> submissionDirectories = Set.of(new File("/path/to/rootDir"));
File baseCode = new File("/path/to/baseCode");
JPlagOptions options = new JPlagOptions(language, submissionDirectories, Set.of()).withBaseCodeSubmissionDirectory(baseCode);
try {
JPlagResult result = JPlag.run(options);
// Optional
ReportObjectFactory reportObjectFactory = new ReportObjectFactory();
reportObjectFactory.createAndSaveReport(result, "/path/to/output");
} catch (ExitException e) {
// error handling here
}
We're happy to incorporate all improvements to JPlag into this codebase. Feel free to fork the project and send pull requests. Please consider our guidelines for contributions.
If you encounter bugs or other issues, please report them here. For other purposes, you can contact us at jplag@ipd.kit.edu . If you are doing research related to JPlag, we would love to know what you are doing. Feel free to contact us!