Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
Table 4
Results of RFECV feature selections.
Feature set
FS method
#Features
5CV on PDB1616
Test on PDB186
ACC
MCC
SP
SN
ACC
MCC
SP
SN
PSSMR_All
NFS
580
71.47
43.06
67.82
75.12
67.20
34.65
61.29
73.12
LinSVM20
116
74.20
48.44
71.91
76.49
68.82
37.95
62.37
75.27
LinSVM20_RFE
110
74.20
48.43
72.15
76.24
68.28
36.73
63.44
73.12
LR20
116
73.64
47.35
70.92
76.36
65.59
31.45
59.14
72.04
LR20_RFE
114
73.58
47.22
70.92
76.24
66.13
32.58
59.14
73.12
PSSMS_All
NFS
580
72.77
45.97
65.97
79.58
73.12
47.34
62.37
83.87
LinSVM30
174
75.56
51.36
70.67
80.45
73.66
48.12
64.52
82.80
LinSVM30_RFE
149
75.00
50.34
69.18
80.82
71.51
43.94
61.29
81.72
LR30
174
75.74
51.93
69.18
82.30
74.73
50.31
65.59
83.87
LR30_RFE
157
75.74
52.01
68.69
82.80
73.66
48.33
63.44
83.87
ESM_Avg
NFS
1280
79.27
59.08
72.52
86.01
79.03
59.06
69.89
88.17
LinSVM20
256
81.31
62.92
76.49
86.14
78.49
57.85
69.89
87.10
LinSVM20_RFE
253
81.06
62.38
76.61
85.52
78.49
57.85
69.89
87.10
LR20
256
82.80
66.06
76.86
88.74
78.49
57.65
70.97
86.02
LR20_RFE
243
82.49
65.45
76.49
88.49
79.57
60.28
69.89
89.25
ESM_All
NFS
37120
78.90
58.43
71.53
86.26
79.57
60.87
67.74
91.40
LinSVM5
1856
85.64
71.86
79.33
91.96
79.03
60.68
64.52
93.55
LinSVM5_RFE
1581
87.07
74.61
81.44
92.70
78.49
59.36
64.52
92.47
LR4
1485
90.22
80.75
85.89
94.55
80.11
61.81
68.82
91.40
LR4_RFE
1392
90.66
81.61
86.39
94.93
80.65
63.08
68.82
92.47
Note. The number highlighted in bold is the best result corresponding to one feature set. An underlined number represents the optimal result over all feature sets.