Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2013, Article ID 414327, 8 pages
http://dx.doi.org/10.1155/2013/414327
Research Article

A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features

1Institute of Systems Biology, Shanghai University, Shanghai 200444, China
2Department of Mathematics, College of Science, Shanghai University, Shanghai 200444, China
3State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Jiaotong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China
4Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA
5Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
6Department of Biomedical Engineering, Tianjin University, Tianjin Key Lab of BME Measurement, Tianjin 300072, China
7CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China

Received 3 March 2013; Accepted 26 March 2013

Academic Editor: Bin Niu

Copyright © 2013 Tong-Hui Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary Material

Online Supporting Information S1: The training dataset contains 43,903 ordered samples and 43,903 disordered samples of amino acid residues by 21-residue sliding window.

Online Supporting Information S2: The testing dataset contains 54,582 ordered samples and 11,734 disordered samples of amino acid residues by 21-residue sliding window.

Online Supporting Information S3: After calculating the Cramer’s V coefficient between features and target variables and removing the features with the Cramer’s V coefficient small than 0.1, 175 features remained.

Online Supporting Information S4: The mRMR ranks of the 175 features.

Online Supporting Information S5: The IFS results on training set evaluated by 10-fold cross validation, ACC (Accuracy), MCC (Matthews correlation coefficient), SN (sensitivity), SP (specificity) included.

Online Supporting Information S6: The optimal feature subset includes 128 features.

Online Supporting Information S7: The prediction specific outcome of each residue on testing set. The first 5 columns are the output by WEKA program and the last is the prediction result after scanning the sequence.

  1. Online Supporting Information S1
  2. Online Supporting Information S2
  3. Online Supporting Information S3
  4. Online Supporting Information S4
  5. Online Supporting Information S5
  6. Online Supporting Information S6
  7. Online Supporting Information S7