Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
Table 2
Performance comparison of baseline classifiers based on the fivefold CV and the test set using the entire feature sets.
Feature set (#Features)
Classifier
Fivefold cross-validation on PDB1616
Blind test on PDB186
ACC (%)
MCC
SP (%)
SN (%)
ACC (%)
MCC
SP (%)
SN (%)
PSSMR_Avg (20)
GNB
65.53
0.3108
67.08
63.99
61.83
0.2406
52.69
70.97
KNN
66.09
0.3229
70.17
62.00
58.06
0.1613
56.99
59.14
DT
60.89
0.2178
60.52
61.26
60.75
0.2157
56.99
64.52
LR
69.74
0.3950
71.16
68.32
63.98
0.2856
53.76
74.19
SVM
70.85
0.4200
64.98
76.73
66.13
0.3395
50.54
81.72
RF
69.37
0.3875
68.32
70.42
65.59
0.3154
58.06
73.12
GBDT
69.25
0.3849
69.06
69.43
65.05
0.3041
58.06
72.04
XGB
68.19
0.3640
66.71
69.68
60.22
0.2047
56.99
63.44
PSSMS_Avg (20)
GNB
68.32
0.3666
66.34
70.30
68.28
0.3704
60.22
76.34
KNN
68.13
0.3629
66.34
69.93
67.74
0.3552
65.59
69.89
DT
62.93
0.2587
63.37
62.50
63.44
0.2689
62.37
64.52
LR
70.92
0.4186
69.06
72.77
70.97
0.4242
63.44
78.49
SVM
73.21
0.4681
66.71
79.70
72.04
0.4563
59.14
84.95
RF
69.80
0.3962
68.19
71.41
72.04
0.4494
62.37
81.72
GBDT
71.66
0.4336
69.31
74.01
72.04
0.4418
68.82
75.27
XGB
69.06
0.3814
67.45
70.67
70.97
0.4242
63.44
78.49
PSSMR_All (580)
GNB
64.85
0.2975
67.70
62.00
60.75
0.2154
63.44
58.06
KNN
59.34
0.2078
81.19
37.50
59.14
0.2025
80.65
37.63
DT
59.84
0.1968
59.28
60.40
57.53
0.1506
55.91
59.14
LR
68.44
0.3692
70.67
66.21
67.20
0.3443
65.59
68.82
SVM
71.47
0.4306
67.82
75.12
67.20
0.3465
61.29
73.12
RF
70.17
0.4044
66.83
73.51
62.37
0.2511
53.76
70.97
GBDT
70.73
0.4146
70.67
70.79
64.52
0.2920
59.14
69.89
XGB
69.18
0.3837
68.69
69.68
65.05
0.3051
56.99
73.12
PSSMS_All (580)
GNB
64.98
0.3083
76.86
53.09
57.53
0.1610
75.27
39.78
KNN
62.56
0.2610
76.11
49.01
64.52
0.2920
69.89
59.14
DT
63.06
0.2612
64.23
61.88
54.30
0.0866
48.39
60.22
LR
69.74
0.3949
68.81
70.67
69.35
0.3873
67.74
70.97
SVM
72.77
0.4597
65.97
79.58
73.12
0.4734
62.37
83.87
RF
71.60
0.4337
67.08
76.11
73.66
0.4812
64.52
82.80
GBDT
70.30
0.4061
68.81
71.78
75.27
0.5073
70.97
79.57
XGB
70.24
0.4056
66.96
73.51
69.89
0.4012
63.44
76.34
ESM_Avg (1280)
GNB
71.35
0.4275
73.89
68.81
70.97
0.4209
75.27
66.67
KNN
74.63
0.4927
73.64
75.62
72.58
0.4548
66.67
78.49
DT
63.00
0.2599
63.49
62.50
61.29
0.2266
56.99
65.59
LR
78.22
0.5646
76.86
79.58
78.49
0.5765
70.97
86.02
SVM
79.27
0.5908
72.52
86.01
79.03
0.5906
69.89
88.17
RF
74.32
0.4864
74.50
74.13
75.27
0.5055
76.34
74.19
GBDT
76.67
0.5339
74.50
78.84
74.73
0.4960
70.97
78.49
XGB
75.43
0.5090
73.64
77.23
77.42
0.5495
74.19
80.65
ESM_All (37120)
GNB
65.78
0.3235
76.73
54.83
58.60
0.1811
74.19
43.01
KNN
64.17
0.2972
79.21
49.13
66.13
0.3269
32.69
58.06
DT
60.27
0.2056
62.38
58.17
56.45
0.1318
46.24
66.67
LR
78.28
0.5658
76.86
79.70
77.96
0.5666
69.89
86.02
SVM
78.90
0.5843
71.53
86.26
79.57
0.6087
67.74
91.40
RF
72.59
0.4520
70.79
74.38
73.12
0.4641
68.82
77.42
GBDT
77.72
0.5550
75.50
79.95
75.27
0.5083
69.89
80.65
XGB
77.41
0.5487
75.37
79.46
75.27
0.5112
67.74
82.80
Note. The number highlighted in bold is the best result corresponding to one feature set. An underlined number represents the optimal result over all feature sets.