Detailed Analysis of Extrinsic Plagiarism Detection System Using Machine Learning Approach (Naive Bayes and SVM)

Zakiy Firdaus Alfikri, Ayu Purwarianti

Abstract


In this report we proposed a detailed analysis method of plagiarism detection system using machine learning approach. We used Naive Bayes and Support Vector Machine (SVM) as learning algorithms. Learning features used in the method are words similarity, fingerprints similarity, latent semantic analysis (LSA) similarity, and word pair. The purpose in selecting those features is to retrieve information from the state-of-the-art detailed analysis methods (words similarity, fingerprinting, and LSA) in order to integrate the strength of each method in detecting plagiarism. Several experiments were conducted to test the performance of the proposed method in detecting many cases of plagiarism. The experiments used data test that contains cases of literal plagiarism, partial literal plagiarism, paraphrased plagiarism, plagiarism with changed sentence structure, and translated plagiarism. The data test also contains cases of non-plagiarism of different topics and non-plagiarism of the same topic. The results obtained in experiments using SVM showed an average accuracy of 92.86% (reaching 95.71% without using words similarity feature). While the result obtained using Naive Bayes showed an average accuracy of 54.29% (reaching 84.29% without using the word pair features).


Keywords


Plagiarism Detection; Machine Learning; Bayes; Support Vector Machine

Full Text:

PDF


DOI: http://doi.org/10.11591/tijee.v12i11.3995

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License