Table of Contents
ISRN Bioinformatics
Volume 2012, Article ID 564715, 18 pages
Research Article

Nonlinear Dependence in the Discovery of Differentially Expressed Genes

1Department of Electrical and Computer Engineering, Michigan State University, 2120 EB, East Lansing, MI 48824, USA
2Carcinogenesis Laboratory, Department of Molecular Biology and Biochemistry, Michigan State University, 341 FST, East Lansing, MI 48824, USA
3College of Computer Science and Information Engineering, Zhejiang Gongshang University, 18 Xuezheng Street, Zhejiang Province Hangzhou, 310018, China

Received 16 September 2011; Accepted 9 November 2011

Academic Editors: T. Can and S. Panni

Copyright © 2012 J. R. Deller et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are “discovered” when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, 0, of no difference in expression. A false discovery (type 1 error) occurs when 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the “moment framework” by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3×3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations.