Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
Average RAEL2 barplot with error bars. RAEL2 values are represented on the -axis. The -axis has the 6 sites (1, 2, 3, 4, 5, and 6) and 10 imputation tests (BPCA, KNN with , LLS with , LSA, NIPALS, ROW, and SVD). Mean (M) depicted by the slashed bar represents the overall mean for individual IM where the RAEL2 values are averaged across the 4 pools and 6 sites. This figure shows the performance of the 10 imputation tests using the RAEL2 metric with 5% deletion of values. 1000 simulations were performed where each simulation generated a dataset containing 5% missing values by randomly removing probe set values from the complete expression matrix of probe sets. Missing values were imputed using the 10 imputation tests. The results are compared using the RAEL2 error measure (see Section 2). The RAEL2 values are averaged across the 4 pools. ROW has the best performance as it has the lowest RAEL2 value for a given site. KNN with has the highest RAEL2 value and has the worst performance for all pools and all sites.