Show simple item record

dc.contributor.advisorYanti, Maulida
dc.contributor.authorWulandari, Rati
dc.date.accessioned2025-09-18T07:40:27Z
dc.date.available2025-09-18T07:40:27Z
dc.date.issued2025
dc.identifier.urihttps://repositori.usu.ac.id/handle/123456789/108486
dc.description.abstractClustering is one of the essential techniques in data analysis for discovering hidden patterns and grouping data based on certain similarities. However, the application of clustering algorithms to high-dimensional data often encounters challenges, particularly due to the presence of noise and irrelevant features. This study aims to analyze and compare the performance of K-Means and DBSCAN algorithms on high-dimensional data under various conditions. Two datasets were used: the Kaggle diabetes dataset with eight medical variables and the human liver gene expression dataset from ARCHS4 consisting of 35,238 gene features. To reduce dimensional complexity, Principal Component Analysis (PCA) wasapplied, ensuring that the cumulative variance retained was not less than 80%. Performance evaluation was carried out using two metrics, namely the Davies-Bouldin Index (DBI) and the Silhouette Score (SS). The results indicate that K-Means demonstrates more stable performance in most scenarios, particularly when the data is clean or relatively homogeneous, with consistently positive Silhouette Scores. On the other hand, DBSCAN performs better in scenarios with high levels of noise as it can explicitly identify outliers, although it tends to classify a large portion of the data as noise under other conditions. Overall, K-Means is more suitable for data with spherical and evenly distributed clusters, whereas DBSCAN is more appro priate for data with varying densities and the presence of noise.en_US
dc.language.isoiden_US
dc.publisherUniversitas Sumatera Utaraen_US
dc.subjectClusteringen_US
dc.subjectK-Meansen_US
dc.subjectDBSCANen_US
dc.subjectPrincipal Component Analysisen_US
dc.subjectDavies-Bouldin Indexen_US
dc.subjectSilhouette Scoreen_US
dc.subjectHigh-Dimensional Dataen_US
dc.titlePerbandingan Performa Algoritma K-Means dan DBSCAN dalam Clustering pada Data Berdimensi Tinggien_US
dc.title.alternativePerformance Comparison of K-Means and DBSCAN Algorithms in Clustering High-Dimensional Dataen_US
dc.typeThesisen_US
dc.identifier.nimNIM210803045
dc.identifier.nidnNIDN0024109003
dc.identifier.kodeprodiKODEPRODI44201#Matematika
dc.description.pages98 Pagesen_US
dc.description.typeSkripsi Sarjanaen_US
dc.subject.sdgsSDGs 4. Quality Educationen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record