Research Article

Predicting Interactions between Virus and Host Proteins Using Repeat Patterns and Composition of Amino Acids

Table 2

Comparison of different combinations of features in 10-fold cross validation of SVM model.

FeaturesSn (%)Sp (%)Acc (%)PPV (%)NPV (%)MCCAUC

F181.6697.0289.3496.4884.100.7960.916
F269.7585.1177.4382.4173.780.5550.849
F387.7897.8192.7997.5688.890.8600.965
F1 + F280.8895.6188.2494.8583.330.7730.925
F1 + F388.5697.3492.9497.0889.480.8620.966
F2 + F387.4696.8792.1696.5488.540.8470.961
F1 + F2 + F388.2497.3492.7997.0789.220.8590.963

F1: sum of squared length of single amino acid repeats in the entire protein sequence, F2: maximum of the sum of squared length of single amino acid repeats in a window of 6 residues, F3: composition of amino acids in 5 partitions of the protein sequence, Sn: sensitivity, Sp: specificity, Acc: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, and AUC: the area under the ROC curve.