Research Article

Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications

Table 6

Factors influencing accuracy (RMSE) for each algorithm (standard beta coefficient): mean imputation.

Data characteristictrees.J48BayesNetSMORegressionLogisticIBk

N_attributes−.076**−.075**−.178**−.072**.115**.007
N_cases−.079**−.049**.012−.017−.032−.048**
C_imbalance.117**.239**.264**.525**.163**.198**
R_missing.051*.078**.040.080**.076**.068**
SE_HS.249**.285**.186**.277**.335**.245**
SE_VS−.009−.013−.006−.013−.016−.010
Spread−.382**−.430**−.261**−.436**−.452**−.363**
P_missing_dum1−.049−.038−.038−.037−.045−.038
P_missing_dum2−.002.014.002.011.001.011

Note  1: N_attributes: number of attributes, N_cases: number of cases, C_imbalance: degree of class imbalance, R_missing: missing data ratio, SE_HS: horizontal scatteredness, SE_VS: vertical scatteredness, spread: missing data spread, and missing patterns: univariate (P_missing_dum1 = 1, P_missing_dum2 = 0), monotone (P_missing_dum1 = 0, P_missing_dum2 = 1), and arbitrary (P_missing_dum1 = 1, P_missing_dum2 = 1)
Note  2: RMSE indicates error; therefore, lower values are better.
Note  3: * < 0.05, ** < 0.01.