Research Article

Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

Table 3

Accuracy using the 6 most relevant features for three RBF-SVM models.

Datasets Acc. Our proposal RBF-SVM Acc. Rand. RBF-SVM Acc. “balanced’’ RBF-SVM % relative difference for our proposal versus “Rand’’ model % relative difference for our proposal versus “balanced’’ model

Binary-GS 94,111 95,411 92,401 −1,381 1,817
Ito-core 72,059 81,195 52,571 −12,678 27,045
LC-multiple 93,750 95,924 93,517 −2,319 0,249
Uetz-screen 76,857 80,822 54,882 −5,159 28,592
Random negative dataset 1 72,211 38,353 6,537 46,888 90,947
Random negative dataset 2 71,951 37,937 37,444 47,274 47,959
𝑅 t e s t _ 1 3 58,184 29,349 1,883 49,558 96,764
𝑅 t e s t _ 2 3 63,596 30,150 1,882 52,591 97,041
𝑅 t e s t _ 3 3 96,469 69,365 1,683 28,096 98,255
𝑅 t e s t _ 4 3 62,221 31,061 1,522 50,080 97,554
𝑅 t e s t _ 5 3 61,248 29,862 1,364 51,244 97,773
𝑅 t e s t _ 6 3 64,992 33,120 1,702 49,040 97,381
𝑅 t e s t _ 7 3 64,441 31,454 1,824 51,189 97,170
𝑅 t e s t _ 8 3 94,705 67,821 1,702 28,387 98,203
𝑅 t e s t _ 9 3 64,334 31,237 1,061 51,446 98,351

Acc. is the accuracy of the RBF SVM model. Our proposal RBF-SVM is the SVM model trained using the training set formed by the GSP set and the GSN set obtained using the proposed hierarchical clustering method. Rand. RBF-SVM is the SVM model trained using the training set where the GSN set was randomly selected. “balanced” RBF-SVM is the SVM model trained using the training set formed by the GSP set and the GSN set obtained using the approach to create a “balanced” negative set by Yu et al. [38]. % relative difference is the percentage of relative difference using “our proposal RBF-SVM” as basis.