Plagiarism Detection as was introduced in Khan H. et al. paper:
Plagiarism is becoming a notorious problem in academic community. It occurs when someone uses the work of another person without proper acknowledgement to the original source. The plagiarism problem poses serious threats to academic integrity and with the advent of the Web, manual detection of plagiarism has become almost impossible. Over past two decades, automatic plagiarism detection has received significant attention in developing small- to large-scale plagiarism detection systems as a possible countermeasure. Given a text document, the task of a plagiarism detection system is to find if the document is copied partially or fully from other documents from the Web or any other repository of documents.
In this project I try to find the plagiarism in AraPlagDet corpus(Arabic External Plagiarism Detection Corpus) following these steps:
-
Preprocessing the corpus text files.
-
Preprocessing the input text file (which the user wants to classify as plagiarised or not).
-
Query generation and submition.
-
Similarity computation.