Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2013, Article ID 524502, 8 pages
http://dx.doi.org/10.1155/2013/524502
Research Article

Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information

1Golden Audit College, Nanjing Audit University, Nanjing 210029, China
2School of Geography and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210046, China
3Graduate School of Chinese Academy of Agricultural Sciences, Beijing 100081, China

Received 13 May 2013; Accepted 19 August 2013

Academic Editor: Nestor V. Torres

Copyright © 2013 Xin Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

DNA-binding proteins are fundamentally important in understanding cellular processes. Thus, the identification of DNA-binding proteins has the particularly important practical application in various fields, such as drug design. We have proposed a novel approach method for predicting DNA-binding proteins using only sequence information. The prediction model developed in this study is constructed by support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The hybrid feature is incorporating evolutionary information feature, physicochemical property feature, and two novel attributes. These two attributes use DNA-binding residues and nonbinding residues in a query protein to obtain DNA-binding propensity and nonbinding propensity. The results demonstrate that our SVM-SMO model achieves 0.67 Matthew's correlation coefficient (MCC) and 89.6% overall accuracy with 88.4% sensitivity and 90.8% specificity, respectively. Performance comparisons on various features indicate that two novel attributes contribute to the performance improvement. In addition, our SVM-SMO model achieves the best performance than state-of-the-art methods on independent test dataset.