Research Article

Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods

Figure 1

Feature selection. Eight breast cancer gene expression datasets (feature defining datasets), covering 32418 genes, were used to define a list of rank significant genes. Datasets using the Affymetrix platform, spotted oligonucleotides, and the Agilent platform are colored orange, blue, and red, respectively. These genes were first ranked within each of the eight datasets according to their signal-to-noise ratio, and their across dataset mean rank calculated. This mean rank was significance tested as described in Section 2, resulting in a list of 519 rank significant genes. These 519 genes were reduced to a pool of 283 genes shared by the two training sets (AM and RO) and the testing sets (TR and MA), used in the further study.