Show simple item record

dc.contributor.advisorNababan, Erna Budhiarti
dc.contributor.advisorBudiman, Mohammad Andri
dc.contributor.authorSitopu, Widya Astuti
dc.date.accessioned2025-07-29T04:11:09Z
dc.date.available2025-07-29T04:11:09Z
dc.date.issued2025
dc.identifier.urihttps://repositori.usu.ac.id/handle/123456789/107784
dc.description.abstractThe Makan Bergizi Gratis (MBG) is one of the Indonesian government’s priority initiatives that has received significant coverage in online media. To understand the main themes within these narratives, this study applies topic modeling using Latent Dirichlet Allocation (LDA). However, the results of topic modeling are highly influenced by the preprocessing stage, particularly in handling multiword expressions (MWEs) such as named entities, collocations, and compound words. This study compares two preprocessing approaches: basic and extended, with the latter involving the masking of MWEs. Experimental results show that the extended preprocessing model achieved the highest coherence score of 0.5149 at K=22K = 22K=22, with four other scores also exceeding 0.496, whereas the basic preprocessing model only reached a maximum of 0.3932 at K=10K = 10K=10. Furthermore, cosine similarity scores between topics in the extended model were lower (maximum 0.7406) than in the basic model (maximum 0.8244), indicating that the topics produced were more diverse and less overlapping. These findings highlight the importance of preprocessing strategies that preserve phrase-level meaning to reduce semantic distortion and improve topic coherence and representation-particularly in analyzing media discourse on public policy programs such as MBG.en_US
dc.language.isoiden_US
dc.publisherUniversitas Sumatera Utaraen_US
dc.subjectMultiword Expressionen_US
dc.subjectText Preprocessingen_US
dc.subjectTopic Modelingen_US
dc.subjectLatent Dirichlet Allocationen_US
dc.subjectTopic Coherenceen_US
dc.titleReduksi Distorsi Makna Multiword Expression dengan IndoBERT untuk Pemodelan Topiken_US
dc.title.alternativeReducing Semantic Distortion of Multiword Expressions with IndoBERT for Topic Modelingen_US
dc.typeThesisen_US
dc.identifier.nimNIM237056002
dc.identifier.nidnNIDN0026106209
dc.identifier.nidnNIDN0008107507
dc.identifier.kodeprodiKODEPROD49302#Sains Data dan Kecerdasan Buatan
dc.description.pages61 Pagesen_US
dc.description.typeTesis Magisteren_US
dc.subject.sdgsSDGs 9. Industry Innovation And Infrastructureen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record