Implementasi Model DeBERTa untuk Prediksi Kompleksitas Kata Berbahasa Inggris
Implementation of the DeBERTa Model for English Word Complexity Prediction

Date
2025Author
Sihombing, Johansen
Advisor(s)
Arisandi, Dedy
Purnamawati, Sarah
Metadata
Show full item recordAbstract
Word complexity in English texts poses a significant challenge in the field of Natural
Language Processing (NLP), particularly for the development of automatic text
simplification systems and effective second language learning support tools. Language
learners' comprehension is often hindered by highly complex words. This study aims to
develop and evaluate an English word complexity prediction system using DeBERTa
(Decoding-enhanced BERT with Disentangled Attention), a Transformer model
renowned for its superior contextual representation. The model was trained and tested
on a dataset comprising 8,554 word entries, compiled from the Complex dataset and
augmented with data from the Oxford Dictionary. Evaluation results demonstrated
excellent predictive performance, achieving a Mean Squared Error (MSE) of 0.0036, a
Mean Absolute Error (MAE) of 0.0402, and a Pearson correlation of 0.9770 on the test
set. These findings indicate that the DeBERTa model possesses high accuracy and
robust generalization capabilities in assessing word complexity across diverse text
domains, highlighting its significant potential for advancing NLP applications
concerned with word complexity analysis and processing.
Collections
- Undergraduate Theses [858]