Penerapan Transfer Learning Berbasis Model Bert untuk Mendeteksi Plagiarisme pada Skripsi Berbahasa Indonesia
Implementation of Transfer Learning Based on Bert Model for Detecting Plagiarism in Indonesian-Language Theses
Abstract
Academic integrity in scholarly works is often threatened by the practice of plagiarism, a challenge that grows more complex with the increasing volume of digital documents. The manual verification process for originality in long documents, such as theses, is considered no longer efficient or consistent. This research proposes a solution through the design and implementation of an automated, web-based plagiarism detection system. The system utilizes a transfer learning approach by fine-tuning the IndoBERT model to recognize various plagiarism patterns, including direct copying, mosaic, and paraphrasing in Indonesian-language text. To train the model, a structured dataset was built from 130 thesis documents, which were processed and synthetically augmented to reach 30,524 balanced data rows between plagiarized and non-plagiarized classes. Evaluation results show that the developed model, particularly with the 0.3 dropout rate configuration, is capable of achieving solid performance with 88.30% Accuracy and an 88.29% F1-Score on the test data. The resulting system prototype has been functionally tested, proving capable of accepting PDF document inputs and presenting a comprehensive detection report that includes the similarity percentage, details of indicated paragraphs along with their sources, and an automatically highlighted PDF report. Despite having limitations such as an internal corpus scope, this research successfully demonstrates the effectiveness of the transfer learning model and produces a tool that has the potential to increase efficiency and objectivity in upholding academic originality.
Collections
- Undergraduate Theses [1235]