An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins
Table 3
Fivefold cross-validation results on histone and nonhistone model trained with various features.
Dataset
Training features
Sn
Sp
Acc
MCC
Histone
20D binary code
0.725
0.743
0.740
0.370
BLOSUM62
0.745
0.758
0.756
0.400
Amino acid composition (AAC)
0.750
0.761
0.386
0.407
Amino acid pair composition (AAPC)
0.790
0.802
0.800
0.483
Accessible surface area (ASA)
0.645
0.663
0.660
0.236
Position weight matrix (PWM)
0.700
0.721
0.718
0.329
Position-specific scoring matrix (PSSM)
0.710
0.724
0.721
0.339
Nonhistone
20D binary code
0.698
0.714
0.710
0.366
BLOSUM62
0.697
0.723
0.712
0.375
Amino acid composition (AAC)
0.619
0.640
0.635
0.226
Amino acid pair composition (AAPC)
0.628
0.660
0.652
0.253
Accessible surface area (ASA)
0.562
0.620
0.605
0.159
Position weight matrix (PWM)
0.606
0.625
0.602
0.200
Position-specific scoring matrix (PSSM)
0.665
0.695
0.688
0.319
BLOSUM62 + AAPC
0.706
0.718
0.715
0.377
A total of 3525 lysine sequences were applied in positive and negative data. Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Matthew’s correlation coefficient.