• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Data Science and Artificial Intelligence
    • Master Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Data Science and Artificial Intelligence
    • Master Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Reduksi Distorsi Makna Multiword Expression dengan IndoBERT untuk Pemodelan Topik

    Reducing Semantic Distortion of Multiword Expressions with IndoBERT for Topic Modeling

    Thumbnail
    View/Open
    Cover (558.9Kb)
    Fulltext (1.479Mb)
    Date
    2025
    Author
    Sitopu, Widya Astuti
    Advisor(s)
    Nababan, Erna Budhiarti
    Budiman, Mohammad Andri
    Metadata
    Show full item record
    Abstract
    The Makan Bergizi Gratis (MBG) is one of the Indonesian government’s priority initiatives that has received significant coverage in online media. To understand the main themes within these narratives, this study applies topic modeling using Latent Dirichlet Allocation (LDA). However, the results of topic modeling are highly influenced by the preprocessing stage, particularly in handling multiword expressions (MWEs) such as named entities, collocations, and compound words. This study compares two preprocessing approaches: basic and extended, with the latter involving the masking of MWEs. Experimental results show that the extended preprocessing model achieved the highest coherence score of 0.5149 at K=22K = 22K=22, with four other scores also exceeding 0.496, whereas the basic preprocessing model only reached a maximum of 0.3932 at K=10K = 10K=10. Furthermore, cosine similarity scores between topics in the extended model were lower (maximum 0.7406) than in the basic model (maximum 0.8244), indicating that the topics produced were more diverse and less overlapping. These findings highlight the importance of preprocessing strategies that preserve phrase-level meaning to reduce semantic distortion and improve topic coherence and representation-particularly in analyzing media discourse on public policy programs such as MBG.
    URI
    https://repositori.usu.ac.id/handle/123456789/107784
    Collections
    • Master Theses [18]

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV