Review Article

Biomedical Relation Extraction: From Binary to Complex

Table 5

Performance of existing PPI extraction methods on the data corpora used.

Category Result (%) Corpus References
Recall Precision

Rule-based 86.8 94.3 834 and 752 sentences obtained by a MEDLINE search using these keywords, “protein binding,” “yeast,” “E. coli,” “protein,” and “interaction.” [21]
60 87 550 sentences were retained containing at least one of four keywords “interact,” “bind,” “associate,” “complex,” or one of their inflections from 3343 abstracts retrieved from MEDLINE with the following keywords: “Saccharomyces cerevisiae,” “protein,” and “interaction.” [22]
80.0 80.5 About 1200 sentences were kept from the top 50 biomedical papers retrieved from the Internet by querying using the keyword “protein-protein interaction.” [4]

ML methods 57 90 Training set consists of 500 abstracts from MEDLINE. Evaluation set consists of 56 abstracts collected using search strings “protein” and “inhibit.” [48]
21 91 3.4 million sentences from approximately 3.5 million MEDLINE abstracts dated after 1988 containing at least one notation of a human protein. [49]
71.960 AIMed[38]
87.2 72.5 LLL [39]
76 70 The test corpus consists of 300 randomly selected sentences. [24]
70.770.3 LLL [10]
71.960 AIMed [30]
59.2663.37 LLL [9]
8973 LLL [11]