Research Article

Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications

Table 11

Factors influencing accuracy (RMSE) for each algorithm (standard beta coefficient): -MEANS_CLUSTERING.

Data characteristictrees.J48BayesNetSMORegressionLogisticIBk

N_attributes−.080**−.078**−.181**−.068**.117**.009
N_cases−.079**−.049**.012−.017−.033−.047**
C_imbalance.136**.240**.263**.524**.145**.206**
R_missing.057*.079**.041.084**.079**.057*
SE_HS.236**.289**.183**.271**.315**.264**
SE_VS−.009−.013−.006−.013−.014−.011
Spread−.362**−.439**−.262**−.440**−.474**−.363**
P_missing_dum1−.037−.042−.036−.032−.038−.046
P_missing_dum2.002.013.001.014.009.004

Note  1: N_attributes: number of attributes, N_cases: number of cases, C_imbalance: degree of class imbalance, R_missing: missing data ratio, SE_HS: horizontal scatteredness, SE_VS: vertical scatteredness, spread: missing data spread, and missing patterns: univariate (P_missing_dum1 = 1, P_missing_dum2 = 0), monotone (P_missing_dum1 = 0, P_missing_dum2 = 1), and arbitrary (P_missing_dum1 = 1, P_missing_dum2 = 1)
Note  2: RMSE indicates error; therefore, lower values are better.
Note  3: * < 0.05, ** < 0.01.