• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Doctoral Dissertations
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Computer Science
    • Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Query Expansion pada Retrieval Berbasis Corpus Content Base Retrieval (CBR) di Media Sosial untuk Meningkatkan Hasil Retrieval

    Query Expansion in Corpus Content-Based Retrieval (CBR) on Social Media to Improve Retrieval Performance

    Thumbnail
    View/Open
    Cover (1.062Mb)
    Fulltext (4.056Mb)
    Date
    2025
    Author
    Kaban, Roberto
    Advisor(s)
    Sihombing, Poltak
    Efendi, Syahril
    Lydia, Maya Silvi
    Metadata
    Show full item record
    Abstract
    The rapid growth of social media data has generated a vast and diverse volume of data. Such data are typically unstructured, written in informal language, contain non-standard abbreviations, and exhibit high content dynamics. These characteristics pose significant challenges for Information Retrieval (IR) systems in producing relevant and accurate search results. This study focuses on improving IR performance in social media, specifically for e-government-related queries concerning Indonesia’s National Health Insurance (BPJS Kesehatan) collected from the Twitter (X) platform. Conventional IR models often struggle to handle unstructured content with informal language and abbreviations, leading to low retrieval accuracy. To address this issue, this research proposes a hybrid Query Expansion (QE) model called ROCBERT-QE, which integrates Corpus Content-Based Retrieval (CBR) with Bidirectional Encoder Representations from Transformers (BERT). The ROCBERT-QE model introduces a dual expansion mechanism in which corpus-based co-occurrence captures lexical relationships, while BERT embeddings preserve semantic meaning and contextual information. A domain-specific corpus comprising 5,017 preprocessed tweets related to Indonesia’s National Health Insurance (BPJS) was constructed, containing 6,215 unique terms that represent linguistic variations and informality within public discourse. Experimental results demonstrate that ROCBERT-QE outperforms baseline retrieval methods such as TF-IDF, BM25, and standard BERT. For single-word queries, the model achieved a Recall of 0.8574 and a Precision of 0.8807, while for sentence-based queries, Recall reached 0.8932 and Precision 0.9175. These improvements are attributed to the synergy between frequency-based expansion and deep contextual embeddings, which enable the model to effectively handle lexical noise and semantic ambiguity. The findings highlight the scientific potential of combining corpus-based and transformer-based approaches in IR tasks involving unstructured content. Practically, ROCBERT-QE can be applied to real-time analysis of public discourse in e-government contexts, such as service evaluation, policy feedback, and early detection of public issues. This framework is scalable and adaptable to other domains that feature informal or multilingual data characteristics.
    URI
    https://repositori.usu.ac.id/handle/123456789/111687
    Collections
    • Doctoral Dissertations [67]

    Related items

    Showing items related by title, author, creator and subject.

    • Content Based Video Retrieval Menggunakan Metode Haar Wavelet Transform 

      Febriyanti, Dian Rahayu (Universitas Sumatera Utara, 2019)
      Content-Based Video Retrieval (CBVR) is used to describe the process of capturing the desired video from a large collection based on features extracted from the video. The extracted feature is used to index, classify and ...
    • Implementasi Content Based Video Retrieval Menggunakan Metode Block Truncation Algorithm 

      Ananda, Meisy Putri (Universitas Sumatera Utara, 2020)
      Content-Based Video Retrieval (CBVR) is a method of capturing video documents that are used to find video information based on video content, not only based on the name or description of the video. Content-Based Video ...
    • Optimasi Sistem Pencarian Karya Ilmiah dari Repositori Institusi USU Berbasis Large Language Model (LLM) dengan Retrieval-Augmented Generation (RAG) 

      Ramadan, Andrian Putra (Universitas Sumatera Utara, 2026)
      The Institutional Repository of Universitas Sumatera Utara (USU) currently uses a keyword-based search system that has limitations in understanding semantic context and variations of natural language queries, which often ...

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV