Reduksi Distorsi Makna Multiword Expression dengan IndoBERT untuk Pemodelan Topik

Sitopu, Widya Astuti

Reduksi Distorsi Makna Multiword Expression dengan IndoBERT untuk Pemodelan Topik

dc.contributor.advisor	Nababan, Erna Budhiarti
dc.contributor.advisor	Budiman, Mohammad Andri
dc.contributor.author	Sitopu, Widya Astuti
dc.date.accessioned	2025-07-29T04:11:09Z
dc.date.available	2025-07-29T04:11:09Z
dc.date.issued	2025
dc.identifier.uri	https://repositori.usu.ac.id/handle/123456789/107784
dc.description.abstract	The Makan Bergizi Gratis (MBG) is one of the Indonesian government’s priority initiatives that has received significant coverage in online media. To understand the main themes within these narratives, this study applies topic modeling using Latent Dirichlet Allocation (LDA). However, the results of topic modeling are highly influenced by the preprocessing stage, particularly in handling multiword expressions (MWEs) such as named entities, collocations, and compound words. This study compares two preprocessing approaches: basic and extended, with the latter involving the masking of MWEs. Experimental results show that the extended preprocessing model achieved the highest coherence score of 0.5149 at K=22K = 22K=22, with four other scores also exceeding 0.496, whereas the basic preprocessing model only reached a maximum of 0.3932 at K=10K = 10K=10. Furthermore, cosine similarity scores between topics in the extended model were lower (maximum 0.7406) than in the basic model (maximum 0.8244), indicating that the topics produced were more diverse and less overlapping. These findings highlight the importance of preprocessing strategies that preserve phrase-level meaning to reduce semantic distortion and improve topic coherence and representation-particularly in analyzing media discourse on public policy programs such as MBG.	en_US
dc.language.iso	id	en_US
dc.publisher	Universitas Sumatera Utara	en_US
dc.subject	Multiword Expression	en_US
dc.subject	Text Preprocessing	en_US
dc.subject	Topic Modeling	en_US
dc.subject	Latent Dirichlet Allocation	en_US
dc.subject	Topic Coherence	en_US
dc.title	Reduksi Distorsi Makna Multiword Expression dengan IndoBERT untuk Pemodelan Topik	en_US
dc.title.alternative	Reducing Semantic Distortion of Multiword Expressions with IndoBERT for Topic Modeling	en_US
dc.type	Thesis	en_US
dc.identifier.nim	NIM237056002
dc.identifier.nidn	NIDN0026106209
dc.identifier.nidn	NIDN0008107507
dc.identifier.kodeprodi	KODEPROD49302#Sains Data dan Kecerdasan Buatan
dc.description.pages	61 Pages	en_US
dc.description.type	Tesis Magister	en_US
dc.subject.sdgs	SDGs 9. Industry Innovation And Infrastructure	en_US

Files in this item

Name:: Reduksi Distorsi Makna Multiword ...
Size:: 558.9Kb
Format:: PDF
Description:: Cover

View/Open

Name:: Widya Astuti Sitopu_Reduksi ...
Size:: 1.479Mb
Format:: PDF
Description:: Fulltext

View/Open

This item appears in the following Collection(s)

Master Theses [24]
Tesis Magister

Show simple item record