Research Article

HKC: An Algorithm to Predict Protein Complexes in Protein-Protein Interaction Networks

Table 1

Comparison with MCODE. P, R, and F stand for precision, recall, and F-measure, respectively, and their definitions are given in Section 3.1. MIPS data set contains 4,554 proteins and 12,526 interactions, and SGD-MC data set contains 4,448 proteins and 29,068 interactions. AC is the number of all clusters predicted by the algorithm; EC is the number of effective clusters (with a least one matching complex above overlap ratio 0.4) found by the algorithm; MC is the number of matched complexes in the benchmark set. The sizes of complexcat benchmark and Gavin benchmark are 217 and 204, respectively. For HKC the optimized parameters are , , and , respectively, and for MCODE the optimized parameters are NodeScoreCutoff, fluff (T for true, F for false), haircut (T for true, F for false), and other unspecified parameters adopt the default values.

AlgorithmData setBenchmarkPRFACECMCOptimized parameters

MCODEMIPS0.4550.1940.2716630420.05, F, F
HKCcomplexcat  0.3800.4290.40323790930.6, 10, 0.5
MCODESGD-MC0.2130.2210.21719742480.05, F, T
HKC0.2750.5800.3734981371260.6, 10, 0.8

MCODEMIPS0.3030.0980.1486620200.05, F, T
HKCGavin0.2370.2350.23624558480.6, 20, 0.5
MCODESGD- MC0.2830.1520.19810630310, F, T
HKC0.2710.4020.324487132820.5, 5, 0.5