Computational and Mathematical Methods in Medicine / 2013 / Article / Tab 2 / Research Article
Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information Table 2 The performance of different kinds of feature descriptors with various machine learning algorithms based on main dataset using 5-fold cross-validation.
Machine learning algorithm Feature descriptor PP EI BP + NBP + PP BP + NBP + EI PP + EI BP + NBP + PP + EI Accuracy (%) SVM-SMO 83.2 85.7 85.6 87.3 87.3 89.6 Simple logistic regression 81.8 84.2 84.2 87.0 85.7 88.3 Random forest 81.3 84.3 83.5 86.7 85.9 88.1 Naive bayes 78.6 77.2 82.8 82.3 82.6 84.3 Decision tree 80.2 82.5 82.6 84.4 84.1 86.2 Sensitivity (%) SVM-SMO 82.4 84.9 84.4 86.5 85.8 88.4 Simple logistic regression 80.7 83.1 82.3 84.4 85.6 86.7 Random forest 81.1 83.6 82.8 86.0 85.3 86.2 Naive bayes 76.9 76.1 79.4 80.8 81.1 82.6 Decision tree 78.6 80.4 81.7 82.7 82.5 84.7 Specificity (%) SVM-SMO 84.6 86.3 86.7 88.2 88.6 90.8 Simple logistic regression 82.9 85.5 86.0 88.8 85.9 90.2 Random forest 81.6 85.2 84.1 87.5 86.3 90.0 Naive bayes 80.2 78.5 85.6 83.8 84.7 86.0 Decision tree 81.8 84.7 83.5 86.2 85.7 87.7 Matthew correlation coefficient SVM-SMO 0.55 0.58 0.62 0.66 0.66 0.67 Simple logistic regression 0.56 0.55 0.64 0.62 0.64 0.66 Random forest 0.55 0.56 0.60 0.62 0.63 0.66 Naive bayes 0.52 0.49 0.56 0.53 0.54 0.59 Decision tree 0.53 0.55 0.61 0.63 0.62 0.64 AUC SVM-SMO 0.83 0.86 0.86 0.88 0.87 0.90 Simple logistic regression 0.83 0.84 0.85 0.86 0.85 0.88 Random forest 0.81 0.84 0.84 0.86 0.85 0.87 Naive bayes 0.78 0.76 0.80 0.79 0.80 0.82 Decision tree 0.80 0.82 0.83 0.84 0.84 0.86
BP: binding propensity feature; NBP: nonbinding propensity feature; PP: physicochemical property feature; EI: evolutionary information feature.