Research Article

Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets

Table 1

Summary of imputation methods for 5% and 10% deletion.

Error metric RMSE LRMSE RAE RAEL2 Average
Deletion % 5% 10% 5% 10% 5% 10% 5% 10% 5% 10%

BPCA2.382.335.55.715.135.465.635.674.664.79
KNN19.799.799.88109.83109.799.889.829.92
KNN57.837.839.088.888.338.758.428.798.428.56
LLS15.175.214.294.254.674.54.384.334.634.57
LLS33.833.752.172.252.132.172.172.172.572.58
LLS42.792.922.292.921.961.9622.082.262.48
LSA114.884.924.334.54.714.713.733.78
NIPALS7.257.257.336.967.337.087.337.047.317.08
ROW65.961.51.5221.581.462.772.72
SVD8.968.968.298.088.298.178.338.298.478.38

Rows correspond to imputation methods and columns correspond to error measures with the last columns showing the average across the error measures. Each imputation method is ranked based on its average rank performance across all pools and all sites. The rank values for every error measure and imputation method combination are averaged across the 6 sites and 4 pools as detailed in Section2. Smaller average rank values suggest more accurate imputation methods. From the table, we observe that RMSE metric suggests that LSA imputation method has the best performance. With LRMSE and RAEL2 metrics, ROW is the best imputation method. LLS with (LLS4) has the best performance when we use the RAE error measure. KNN with (KNNl) has the highest rank value for any given error measure; thus, it is the worst performing imputation method. LLS with (LLS4) has the overall best performance across the different error measures. These results hold true for both 5% and 10% deletion.