• Login
    View Item 
    •   USU-IR Home
    • Faculty of Mathematics and Natural Sciences
    • Department of Mathematics
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Mathematics and Natural Sciences
    • Department of Mathematics
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Perbandingan Performa Algoritma K-Means dan DBSCAN dalam Clustering pada Data Berdimensi Tinggi

    Performance Comparison of K-Means and DBSCAN Algorithms in Clustering High-Dimensional Data

    Thumbnail
    View/Open
    Cover (712.8Kb)
    Fulltext (2.140Mb)
    Date
    2025
    Author
    Wulandari, Rati
    Advisor(s)
    Yanti, Maulida
    Metadata
    Show full item record
    Abstract
    Clustering is one of the essential techniques in data analysis for discovering hidden patterns and grouping data based on certain similarities. However, the application of clustering algorithms to high-dimensional data often encounters challenges, particularly due to the presence of noise and irrelevant features. This study aims to analyze and compare the performance of K-Means and DBSCAN algorithms on high-dimensional data under various conditions. Two datasets were used: the Kaggle diabetes dataset with eight medical variables and the human liver gene expression dataset from ARCHS4 consisting of 35,238 gene features. To reduce dimensional complexity, Principal Component Analysis (PCA) wasapplied, ensuring that the cumulative variance retained was not less than 80%. Performance evaluation was carried out using two metrics, namely the Davies-Bouldin Index (DBI) and the Silhouette Score (SS). The results indicate that K-Means demonstrates more stable performance in most scenarios, particularly when the data is clean or relatively homogeneous, with consistently positive Silhouette Scores. On the other hand, DBSCAN performs better in scenarios with high levels of noise as it can explicitly identify outliers, although it tends to classify a large portion of the data as noise under other conditions. Overall, K-Means is more suitable for data with spherical and evenly distributed clusters, whereas DBSCAN is more appro priate for data with varying densities and the presence of noise.
    URI
    https://repositori.usu.ac.id/handle/123456789/108486
    Collections
    • Undergraduate Theses [1471]

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV