Research Article

Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information

Table 2

The performance of different kinds of feature descriptors with various machine learning algorithms based on main dataset using 5-fold cross-validation.

Machine learning algorithmFeature descriptor
PPEIBP + NBP + PPBP + NBP + EIPP + EIBP + NBP + PP + EI

Accuracy (%)
SVM-SMO83.285.785.687.387.389.6
Simple logistic regression81.884.284.287.085.788.3
Random forest81.384.383.586.785.988.1
Naive bayes78.677.282.882.382.684.3
Decision tree80.282.582.684.484.186.2

Sensitivity (%)
SVM-SMO82.484.984.486.585.888.4
Simple logistic regression80.783.182.384.485.686.7
Random forest81.183.682.886.085.386.2
Naive bayes76.976.179.480.881.182.6
Decision tree78.680.481.782.782.584.7

Specificity (%)
SVM-SMO84.686.386.788.288.690.8
Simple logistic regression82.985.586.088.885.990.2
Random forest81.685.284.187.586.390.0
Naive bayes80.278.585.683.884.786.0
Decision tree81.884.783.586.285.787.7

Matthew correlation coefficient
SVM-SMO0.550.580.620.660.660.67
Simple logistic regression0.560.550.640.620.640.66
Random forest0.550.560.600.620.630.66
Naive bayes0.520.490.560.530.540.59
Decision tree0.530.550.610.630.620.64

AUC
SVM-SMO0.830.860.860.880.870.90
Simple logistic regression0.830.840.850.860.850.88
Random forest0.810.840.840.860.850.87
Naive bayes0.780.760.800.790.800.82
Decision tree0.800.820.830.840.840.86

BP: binding propensity feature; NBP: nonbinding propensity feature; PP: physicochemical property feature; EI: evolutionary information feature.