Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
Average LRMSE barplot with error bars. LRMSE values are represented on the -axis. The -axis has the 6 sites (1, 2, 3, 4, 5, and 6) and 10 imputation tests (BPCA, KNN with , LLS with , LSA, NIPALS, ROW, and SVD). Mean (M) depicted by the slashed bar represents the overall mean for individual IM where the LRMSE values are averaged across the 4 pools and 6 sites. This figure shows the performance of the 10 imputation tests using the RMSE metric with 5% deletion of values. 1000 simulations were performed where each simulation generated a dataset containing 5% missing values by randomly removing probe set values from the complete expression matrix of probe sets. Missing values were imputed using the 10 imputation tests. The results are compared using the LRMSE metric (see Section 2). The LRMSE values are averaged across the 4 pools. ROW has the best performance as it has the lowest LRMSE value for a given site. KNN with has the highest LRMSE value and has the worst performance for all pools and all sites.