Coreference Resolution untuk Teks Bahasa Indonesia Menggunakan Random Forest Classifier
Coreference Resolution for Indonesian Text Using Random Forest Classifier

Date
2024Author
Sari, Nia Ulan
Advisor(s)
Purnamawati, Sarah
Rahmat, Romi Fadillah
Metadata
Show full item recordAbstract
Coreference Resolution is a subtask in Natural Language Processing (NLP) that focuses on identifying and solving the reference problem of two or more similar entities in text. In Indonesian texts, especially in novels, coreference resolution is crucial because of the complex language and rich variety of entities and references. Characters and entities in novels often interact, and references to characters may appear repeatedly. Another problem is that the presence of possessive pronouns which are widely used in novel texts in the form of affixes rather than complete words can cause confusion in determining references between entities. Therefore, coreference resolution research was carried out for Indonesian texts by detecting affix possessive pronouns using the Random Forest Classifier method. This research also utilizes Part-of-Speech Tagging (POS Tag) and Named Entity Recognition (NER) to maximize detection of person entities. By using 18 novel texts as training data and 10 novel texts as test data after the pre-processing stage, there are a total of 109306 entity and pronoun pairs in the training data, and 4938 pairs in the test data. This research uses RandomSearchCV to help the Random Forest Classifier algorithm find the best hyperparameters in the training process. By using the confusion matrix evaluation method, the metric values obtained from the test results of all test data are an accuracy of 85.5%, precision of 85%, recall of 82.2%, and f1-score of 83.6%.
Collections
- Undergraduate Theses [765]