Research Article

Robust Microarray Meta-Analysis Identifies Differentially Expressed Genes for Clinical Prediction

Figure 2

Procedure for comparing the predictive performance of six microarray meta-analysis-based FS methods. (a) Features are selected from microarray datasets using the rank average meta-analysis method (pink box), several other meta-analysis methods (orange boxes: mDEDS, rank products, Choi, and Wang), and a naive method (blue box) that aggregates samples into a larger dataset. Rank average meta-analysis chooses a single feature selection (FS) method from among several basic FS methods (SAM, fold change, rank sum, -test, mRMRD, and mRMRQ) for each individual dataset that optimizes prediction performance (via cross-validation) over the top 20 features. A simple weighted average of gene ranks from all individual datasets produces the final set of rank average meta-analysis features. The rank products, Choi, and Wang methods use one basic FS method to select features from multiple datasets while the mDEDS method uses all six basic FS methods. (b) Features are selected from two or more datasets from each group to build a classifier (pink boxes), which is trained with samples from only one dataset (yellow boxes). The performance of the classifier is assessed using independent datasets (datasets not used for training or feature selection, green boxes). The predictive performance of a microarray meta-analysis-based FS method is an average over all permutations of training and validation datasets (blue boxes). In the example, datasets 1–4 consist of one-channel Affymetrix arrays while dataset 5 (in the case of heterogeneous data) consists of two-channel arrays.
989637.fig.002a
(a) Selecting features from multiple microarray datasets using six meta-analysis-based methods
989637.fig.002b
(b) Example of dataset permutations for evaluating meta-analysis predictive performance