Table of Contents Author Guidelines Submit a Manuscript
Journal of Biomedicine and Biotechnology
Volume 2005, Issue 2, Pages 147-154
Research article

Classification and Selection of Biomarkers in Genomic Data Using LASSO

1Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA
2Departments of Pathology and Urology, University of Michigan, 1300 Catherine Road, Ann Arbor, MI 48109-1063, USA

Received 3 June 2004; Accepted 13 August 2004

Copyright © 2005 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been done on assessing univariate associations between gene expression profiles with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to the consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.