Research Article

Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications

Table 7

Factors influencing accuracy (RMSE) for each algorithm (standard beta coefficient): group mean imputation.

Data characteristictrees.J48BayesNetSMORegressionLogisticIBk

N_attributes−.068**−.072**−.179**−.068**.115**.010
N_cases−.082**−.050**.011−.018−.034*−.047**
C_imbalance.115**.228**.260**.517**.156**.197**
R_missing.050**.085**.043.084**.095**.066**
SE_HS.230**.268**.178**.273**.300**.248**
SE_VS−.008−.012−.006−.013−.013−.010
Spread−.296**−.439**−.264**−.443**−.476**−.382**
P_missing_dum1−.043−.032−.034−.035−.035−.041
P_missing_dum2.002.024.004.016.021.013

Note  1: N_attributes: number of attributes, N_cases: number of cases, C_imbalance: degree of class imbalance, R_missing: missing data ratio, SE_HS: horizontal scatteredness, SE_VS: vertical scatteredness, spread: missing data spread, and missing patterns: univariate (P_missing_dum1 = 1, P_missing_dum2 = 0), monotone (P_missing_dum1 = 0, P_missing_dum2 = 1), and arbitrary (P_missing_dum1 = 1, P_missing_dum2 = 1)
Note  2: RMSE indicates error; therefore, lower values are better.
Note  3: * < 0.05, ** < 0.01.