Show simple item record

dc.contributor.advisorPurnamawati, Sarah
dc.contributor.advisorRahmat, Romi Fadillah
dc.contributor.authorSari, Nia Ulan
dc.date.accessioned2024-08-30T06:40:25Z
dc.date.available2024-08-30T06:40:25Z
dc.date.issued2024
dc.identifier.urihttps://repositori.usu.ac.id/handle/123456789/96434
dc.description.abstractCoreference Resolution is a subtask in Natural Language Processing (NLP) that focuses on identifying and solving the reference problem of two or more similar entities in text. In Indonesian texts, especially in novels, coreference resolution is crucial because of the complex language and rich variety of entities and references. Characters and entities in novels often interact, and references to characters may appear repeatedly. Another problem is that the presence of possessive pronouns which are widely used in novel texts in the form of affixes rather than complete words can cause confusion in determining references between entities. Therefore, coreference resolution research was carried out for Indonesian texts by detecting affix possessive pronouns using the Random Forest Classifier method. This research also utilizes Part-of-Speech Tagging (POS Tag) and Named Entity Recognition (NER) to maximize detection of person entities. By using 18 novel texts as training data and 10 novel texts as test data after the pre-processing stage, there are a total of 109306 entity and pronoun pairs in the training data, and 4938 pairs in the test data. This research uses RandomSearchCV to help the Random Forest Classifier algorithm find the best hyperparameters in the training process. By using the confusion matrix evaluation method, the metric values obtained from the test results of all test data are an accuracy of 85.5%, precision of 85%, recall of 82.2%, and f1-score of 83.6%.en_US
dc.language.isoiden_US
dc.publisherUniversitas Sumatera Utaraen_US
dc.subjectcoreference resolutionen_US
dc.subjectrandom forest classifieren_US
dc.subjectnatural language processingen_US
dc.subjectnovelen_US
dc.subjectindonesianen_US
dc.subjectrandomsearchcven_US
dc.subjectSDGsen_US
dc.titleCoreference Resolution untuk Teks Bahasa Indonesia Menggunakan Random Forest Classifieren_US
dc.title.alternativeCoreference Resolution for Indonesian Text Using Random Forest Classifieren_US
dc.typeThesisen_US
dc.identifier.nimNIM171402045
dc.identifier.nidnNIDN0026028304
dc.identifier.nidnNIDN0003038601
dc.identifier.kodeprodiKODEPRODI59201#Teknologi Informasi
dc.description.pages94 Pagesen_US
dc.description.typeSkripsi Sarjanaen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record