Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications
Table 1
The characteristics of missing data.
Variables
Meaning
Calculation
Missing data ratio
The number of missing values in the entire dataset as compared to the number of nonmissing values
The number of empty data cells/total cells
Patterns of missing data
Univariate
Ratio of missing to complete values for an existing feature compared to the values for all features
Monotone
Arbitrary
Horizontal scatteredness
Distribution of missing values within each data record
Determine the number of missing cells in each record and calculate the standard deviation
Vertical scatteredness
Distribution of missing values for each attribute
Determine the number of missing cells in each feature and calculate the standard deviation
Missing data spread
Larger standard deviations indicate stronger effects of missing data
Determine the weighted average of the standard deviations of features with missing data (weight: the ratio of missing to complete data for each feature)