Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 438341, 10 pages
http://dx.doi.org/10.1155/2014/438341
Research Article

Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm

1Institute of Systems Biology, Shanghai University, Shanghai 200444, China
2Department of Mathematics, Shaoyang University, Shaoyang, Hunan 422000, China
3School of Biomedical Engineering, Shanghai Jiaotong University, Shanghai 200240, China
4Shanghai Center for Bioinformation Technology, Shanghai 200235, China
5Graduate School of the Chinese Academy of Sciences, Beijing 100049, China
6State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
7East China Normal University Software Engineering Institute, Shanghai 200062, China
8Department of Biomedical Engineering, Tianjin University, Tianjin Key Lab of BME Measurement, Tianjin 300072, China
9Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China

Received 19 June 2014; Accepted 23 July 2014; Published 12 August 2014

Academic Editor: Tao Huang

Copyright © 2014 Guohua Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews’ correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews’ correlation coefficients of 0.2239.