Research Article

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins

Table 4

Results of RFECV feature selections.

Feature setFS method#Features5CV on PDB1616Test on PDB186
ACCMCCSPSNACCMCCSPSN

PSSMR_AllNFS58071.4743.0667.8275.1267.2034.6561.2973.12
LinSVM2011674.2048.4471.9176.4968.8237.9562.3775.27
LinSVM20_RFE11074.2048.4372.1576.2468.2836.7363.4473.12
LR2011673.6447.3570.9276.3665.5931.4559.1472.04
LR20_RFE11473.5847.2270.9276.2466.1332.5859.1473.12

PSSMS_AllNFS58072.7745.9765.9779.5873.1247.3462.3783.87
LinSVM3017475.5651.3670.6780.4573.6648.1264.5282.80
LinSVM30_RFE14975.0050.3469.1880.8271.5143.9461.2981.72
LR3017475.7451.9369.1882.3074.7350.3165.5983.87
LR30_RFE15775.7452.0168.6982.8073.6648.3363.4483.87

ESM_AvgNFS128079.2759.0872.5286.0179.0359.0669.8988.17
LinSVM2025681.3162.9276.4986.1478.4957.8569.8987.10
LinSVM20_RFE25381.0662.3876.6185.5278.4957.8569.8987.10
LR2025682.8066.0676.8688.7478.4957.6570.9786.02
LR20_RFE24382.4965.4576.4988.4979.5760.2869.8989.25

ESM_AllNFS3712078.9058.4371.5386.2679.5760.8767.7491.40
LinSVM5185685.6471.8679.3391.9679.0360.6864.5293.55
LinSVM5_RFE158187.0774.6181.4492.7078.4959.3664.5292.47
LR4148590.2280.7585.8994.5580.1161.8168.8291.40
LR4_RFE139290.6681.6186.3994.9380.6563.0868.8292.47

Note. The number highlighted in bold is the best result corresponding to one feature set. An underlined number represents the optimal result over all feature sets.