Computational Intelligence and Neuroscience
Volume 2008 (2008), Article ID 276535, 12 pages
doi:10.1155/2008/276535
Research Article
Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature
1Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA
2Department of Biology, University of Memphis, Memphis, TN 38152-3150, USA
Received 23 October 2007; Accepted 4 February 2008
Academic Editor: Rafal Zdunek
Copyright © 2008 Kevin E. Heinrich et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Identifying functional groups of genes is a challenging problem for biological applications.
Text mining approaches can be used to build hierarchical clusters or trees from the information in the biological literature. In particular, the nonnegative matrix factorization (NMF) is examined as one approach to label hierarchical trees. A generic labeling algorithm as well as an evaluation technique is proposed, and the effects of different NMF parameters with regard to convergence and labeling accuracy are discussed. The primary goals of this study are to provide a qualitative assessment of the NMF and its various parameters and initialization, to provide an automated way to classify biomedical data, and to provide a method for evaluating labeled data assuming a static input tree. As a byproduct, a method for generating gold standard trees is proposed.