| dc.description.abstract | Linear regression is the relationship between independent variables and
dependent variables that is described in the form of a straight line (linear). Linear
regression models are used to predict the values of unknown variables. These predic
tions use values found in the dataset. Due to the large size of the dataset, it is prone
to containing outliers. An outlier is a value that deviates from the pattern of a set
of data in the dataset. Outliers cause the data to not be normally distributed and
affect the resulting regression model. However, outliers cannot be removed directly.
Therefore, a method is needed to address outliers in the data without having to remove
them. By using two types of data, namely synthetic data and real-world data, this study
will test the existence of three types of outliers, namely vertical outliers, bad leverage
points, and influential points, using three regression methods to address outliers,
namely the Ordinary Least Squares (OLS) method, the Least Median Squares (LMS)
method, and H¨uber regression. This study also uses the Mean Absolute Error (MAE),
Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and
R-Squared (R2) metrics as comparison materials between the three methods. This
study aims to draw conclusions about the appropriate method for addressing outliers
based on their type. In the synthetic data and real-world data tested, the effects of the
three types of outliers and the evaluation results are shown graphically and numerically | en_US |