Penanganan Imbalance Data pada Hasil Klaster dengan SMOTE untuk Prediksi Permintaan Perusahaan Ekspedisi Menggunakan XGBoost
Handling Imbalanced Data in Clustering Results Using SMOTE for Demand Prediction in Logistics Companies with XGBoost
Abstract
The rapid growth of online shopping has driven the increasing need for accurate
demand prediction in logistics and courier service companies. However, this demand
presents challenges due to imbalanced data, which causes predictive models to be
biased toward the majority class. This study proposes a combined approach using the
Synthetic Minority Over-sampling Technique (SMOTE) and K-Means clustering to
address data imbalance, along with the Extreme Gradient Boosting (XGBoost)
algorithm as the predictive model. A historical dataset consisting of 45,684 entries was
used, including features such as quantity, unit, weight, and destination. The research
stages included preprocessing, normalization, clustering, evaluation (using silhouette
score, Davies-Bouldin index, and Calinski-Harabasz score), and oversampling of
minority clusters. The application of SMOTE for handling imbalanced data proved to
enhance model performance, Despite the enhancement being rather modest owing to
the initial model's already robust performance. Nevertheless, in the context of
imbalanced data, such improvement is meaningful as it indicates that the minority class
receives more balanced attention from the model.
Collections
- Master Theses [18]