Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
Average RAE barplot with error bars. RAE values are represented on the -axis. The -axis has the 6 sites (1, 2, 3, 4, 5, and 6) and 10 imputation tests (BPCA, KNN with , lls with , LSA, NIPALS, ROW, and SVD). Mean (M) depicted by the slashed bar represents the overall mean for individual IM where the RAE values are averaged across the 4 pools and 6 sites. This figure shows the performance of the 10 imputation tests using the RAE metric with 5% deletion of values. 1000 simulations were performed where each simulation generated a dataset containing 5% missing values by randomly removing probe set values from the complete expression matrix of probe sets. Missing values were imputed using the 10 imputation tests. The results are compared using the RAE metric (see Section 2). The RAE values are averaged across the 4 pools. LLS with has the best performance as it has the lowest RAE value for a given site. KNN with has the highest RAE value and has the worst performance for all pools and all sites.