Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
Average RMSE barplot with error bars. RMSE values are represented on the -axis. The -axis has the 6 sites (1, 2, 3, 4, 5, and 6) and 10 imputation tests (BPCA, KNN with , LLS with , LSA, NIPALS, ROW, and SVD). Mean (M) depicted by the slashed bar is the overall mean for individual IM where the RMSE values are averaged across the 4 pools and 6 sites. This figure shows the performance of the 10 imputation tests using the RMSE metric with 5% deletion of values. 1000 simulations were performed where each simulation generated a dataset containing 5% missing values by randomly removing probe set values from the complete expression matrix of probe sets. Missing values were imputed using the 10 imputation tests. The results are compared using the RMSE metric (see Section 2). The RMSE values are averaged across the 4 pools. LSA has the best performance as it has the lowest RMSE value for a given site. KNN with has the highest RMSE value and has the worst performance for all pools and all sites.