Research Article

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins

Table 2

Performance comparison of baseline classifiers based on the fivefold CV and the test set using the entire feature sets.

Feature set
(#Features)
ClassifierFivefold cross-validation on PDB1616Blind test on PDB186
ACC (%)MCCSP (%)SN (%)ACC (%)MCCSP (%)SN (%)

PSSMR_Avg
(20)
GNB65.530.310867.0863.9961.830.240652.6970.97
KNN66.090.322970.1762.0058.060.161356.9959.14
DT60.890.217860.5261.2660.750.215756.9964.52
LR69.740.395071.1668.3263.980.285653.7674.19
SVM70.850.420064.9876.7366.130.339550.5481.72
RF69.370.387568.3270.4265.590.315458.0673.12
GBDT69.250.384969.0669.4365.050.304158.0672.04
XGB68.190.364066.7169.6860.220.204756.9963.44

PSSMS_Avg
(20)
GNB68.320.366666.3470.3068.280.370460.2276.34
KNN68.130.362966.3469.9367.740.355265.5969.89
DT62.930.258763.3762.5063.440.268962.3764.52
LR70.920.418669.0672.7770.970.424263.4478.49
SVM73.210.468166.7179.7072.040.456359.1484.95
RF69.800.396268.1971.4172.040.449462.3781.72
GBDT71.660.433669.3174.0172.040.441868.8275.27
XGB69.060.381467.4570.6770.970.424263.4478.49

PSSMR_All
(580)
GNB64.850.297567.7062.0060.750.215463.4458.06
KNN59.340.207881.1937.5059.140.202580.6537.63
DT59.840.196859.2860.4057.530.150655.9159.14
LR68.440.369270.6766.2167.200.344365.5968.82
SVM71.470.430667.8275.1267.200.346561.2973.12
RF70.170.404466.8373.5162.370.251153.7670.97
GBDT70.730.414670.6770.7964.520.292059.1469.89
XGB69.180.383768.6969.6865.050.305156.9973.12

PSSMS_All
(580)
GNB64.980.308376.8653.0957.530.161075.2739.78
KNN62.560.261076.1149.0164.520.292069.8959.14
DT63.060.261264.2361.8854.300.086648.3960.22
LR69.740.394968.8170.6769.350.387367.7470.97
SVM72.770.459765.9779.5873.120.473462.3783.87
RF71.600.433767.0876.1173.660.481264.5282.80
GBDT70.300.406168.8171.7875.270.507370.9779.57
XGB70.240.405666.9673.5169.890.401263.4476.34

ESM_Avg
(1280)
GNB71.350.427573.8968.8170.970.420975.2766.67
KNN74.630.492773.6475.6272.580.454866.6778.49
DT63.000.259963.4962.5061.290.226656.9965.59
LR78.220.564676.8679.5878.490.576570.9786.02
SVM79.270.590872.5286.0179.030.590669.8988.17
RF74.320.486474.5074.1375.270.505576.3474.19
GBDT76.670.533974.5078.8474.730.496070.9778.49
XGB75.430.509073.6477.2377.420.549574.1980.65

ESM_All
(37120)
GNB65.780.323576.7354.8358.600.181174.1943.01
KNN64.170.297279.2149.1366.130.326932.6958.06
DT60.270.205662.3858.1756.450.131846.2466.67
LR78.280.565876.8679.7077.960.566669.8986.02
SVM78.900.584371.5386.2679.570.608767.7491.40
RF72.590.452070.7974.3873.120.464168.8277.42
GBDT77.720.555075.5079.9575.270.508369.8980.65
XGB77.410.548775.3779.4675.270.511267.7482.80

Note. The number highlighted in bold is the best result corresponding to one feature set. An underlined number represents the optimal result over all feature sets.