Table of Contents Author Guidelines Submit a Manuscript
International Journal of Genomics
Volume 2017 (2017), Article ID 2354564, 13 pages
Research Article

Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures

Department of Biostatistics and Bioinformatics, Division of Population Sciences, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA

Correspondence should be addressed to Anders E. Berglund

Received 27 September 2016; Revised 15 December 2016; Accepted 4 January 2017; Published 6 February 2017

Academic Editor: Bethany Wolf

Copyright © 2017 Anders E. Berglund et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Results. This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance. Uniqueness: the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. Conclusions. The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified.