Research Article

Predicting Interactions between Virus and Host Proteins Using Repeat Patterns and Composition of Amino Acids

Table 5

Training (TR) and test (TS) datasets for assessing the applicability of the SVM model to new viruses and to new hosts. The average sequence similarity between proteins in TR and those in TS was analyzed using EMBOSS Needle tool [20].

Proteins in training datasetsTarget proteins in test datasetsAverage sequence similarity (%)

25 virus proteins in TR111 HCV proteins in TS15.03
12 SARS virus proteins in TS25.20
10 H1N1 virus proteins in TS35.03
11 HPV-16 proteins in TS43.12
46 HIV-1 proteins in TS53.56

522 human proteins in TR2141 Mus musculus proteins in TS69.20
87 Bos taurus proteins in TS79.07
79 Rattus norvegicus proteins in TS89.76
38 Sus scrofa proteins in TS98.70
64 Escherichia coli K-12 proteins in TS108.04