Research Article

Chemical Entity Recognition and Resolution to ChEBI

Table 4

Evaluation of entity resolution, subset of the gold standard composed by 9,696 chemical entities that contain a mapping to ChEBI. Results of entity resolution for each assessment and method are shown in this table. Have been considered for this evaluation only the entities successfully recognized by both methods. For an exact matching assessment, the amount of entities successfully recognized by both methods was 3,668. For the left, right, left/right, and partial matching assessments, that amount was correspondingly 4,022, 4,082, 4,455, and 5,286 entities. True Positives (TP) is the amount of those entities for which the resolution was correct, that is, the mapping agrees with the gold standard. Values of precision, recall, and F-measure are presented.

Assessment Method TP Precision RecallF-measure

Exact matching Dictionary 3,079 83.94 31.76 46.08
Machine learning 3,206 87.40 33.0747.98
Left matching Dictionary 3,215 79.94 33.16 46.87
Machine learning 3,381 84.06 34.8749.29
Right matching Dictionary 3,191 78.17 32.91 46.32
Machine learning 3,467 84.93 35.7650.33
Left/right matching Dictionary 3,327 74.68 34.31 47.02
Machine learning 3,650 81.93 37.6451.59
Partial matching Dictionary 3,861 73.04 39.82 51.54
Machine learning 4,273 80.84 44.0757.04