Research Article

Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications

Table 8

Factors influencing accuracy (RMSE) for each algorithm (standard beta coefficient): Predictive_Mean_Imputation.

Data characteristictrees.J48BayesNetSMORegressionLogisticIBk

N_attributes−.076**−.076**−.178**−.063**.123**.016
N_cases−.084**−.049**.012−.017−.034*−.047**
C_imbalance.117**.242**.263**.523**.153**.198**
R_missing.050*.079**.043.085**.080**.068**
SE_HS.223**.279**.182**.268**.322**.242**
SE_VS−.008−.013−.006−.013−.015−.009
Spread−.328**−.432**−.262**−.434**−.465**−.361**
P_missing_dum1−.042−.035−.034−.028−.044−.036
P_missing_dum2.008.012.004.018.007.011

Note  1: N_attributes: number of attributes, N_cases: number of cases, C_imbalance: degree of class imbalance, R_missing: missing data ratio, SE_HS: horizontal scatteredness, SE_VS: vertical scatteredness, spread: missing data spread, and missing patterns: univariate (P_missing_dum1 = 1, P_missing_dum2 = 0), monotone (P_missing_dum1 = 0, P_missing_dum2 = 1), and arbitrary (P_missing_dum1 = 1, P_missing_dum2 = 1)
Note  2: RMSE indicates error; therefore, lower values are better.
Note  3: * < 0.05, ** < 0.01.