Unjuk Kerja Term Frequency – Inverse Document Frequency dan K-Means dalam Identifikasi Layanan Pemerintah
Performance of Term Frequency – Inverse Document Frequency and K-Means in Government Service Identification
Abstract
Term Frequency-Inverse Document Frequency (TF-IDF) is used to assess the
importance of words in a document relative to the rest of the document set, while
K-Means clusters documents based on content similarity. Utilizing a text dataset
covering various government services, this study measures the effectiveness of these
methods in identifying and clustering these services. Text pre-processing reduced
the number of words from 30,753 to 15,783, indicating the elimination of irrelevant
words. Visualization of the TF-IDF scatter plot shows a negative relationship
between the frequency of occurrence of a word (TF) and its uniqueness (IDF).
Clustering performance evaluation was performed using Silhouette Index (SI) and
Davies Bouldin Index (DBI), which showed the consistency and good quality of the
generated clusters. A stable SI value of about 0.620 and a consistent DBI value of
about 0.551 indicate that the K-Means algorithm, both with the Euclidean and
Manhattan approaches, is effective in grouping comments into clusters
representing negative, neutral, and positive sentiments. The results of this research
make a significant contribution to the development of information systems that are
more efficient and responsive to public needs, as well as strengthening text data
management in the context of government.
Collections
- Master Theses [620]