Research Article

Chemical Entity Recognition and Resolution to ChEBI

Table 3

Evaluation of entity identification, subset of the gold standard composed by 9,696 chemical entities that contain a mapping to ChEBI. Results of entity identification (named entity recognition and resolution) for each alignment and method are shown in this table. The dictionary method recognized and mapped a total of 18,683 entities while the machine-learning method recognized and mapped 10,681 entities. True positives (TP) is the amount of entity recognitions that agree with the gold standard and for which the mapping also agrees with the gold standard. Values of precision, recall, and F-measure are presented.

Assessment Method TP Precision RecallF-measure

Exact matching Dictionary 4,530 24.25 46.72 31.93
Machine learning 4,783 44.79 49.3346.95
Left matching Dictionary 4,559 24.40 47.02 35.13
Machine learning 4,972 46.56 51.2848.81
Right matching Dictionary 4,592 24.58 47.36 32.36
Machine learning 4,885 45.75 50.3847.95
Left/right matching Dictionary 4,621 24.73 47.67 32.57
Machine learning 5,074 47.52 52.3349.81
Partial matching Dictionary 5,185 27.75 53.48 36.54
Machine learning 5,202 48.72 53.6551.07