Research Article
ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition
Table 8
Class based performance (
-score in %) for BioCreative corpus using various tokenizers.
| Algorithm | Entity type | Development set | Test set | ChemSpot | tmVar | ChemTok | ChemSpot | tmVar | ChemTok |
| CRF | Abbreviation | 68.14 | 66.58 | 68.68 | 67.20 | 65.42 | 68.86 | Family | 69.22 | 67.59 | 72.67 | 71.94 | 70.60 | 74.91 | Formula | 76.57 | 69.70 | 80.12 | 75.29 | 69.81 | 79.37 | Identifier | 63.03 | 59.45 | 74.60 | 63.88 | 61.55 | 74.28 | Multiple | 32.50 | 26.86 | 41.35 | 32.77 | 30.50 | 35.12 | Systematic | 79.41 | 78.09 | 82.85 | 79.95 | 78.33 | 83.11 | Trivial | 85.62 | 84.11 | 88.36 | 85.52 | 83.69 | 87.66 |
| SVM | Abbreviation | 72.59 | 72.12 | 74.50 | 72.42 | 71.65 | 73.75 | Family | 69.82 | 69.69 | 71.99 | 71.81 | 71.57 | 74.31 | Formula | 82.667 | 81.61 | 84.68 | 82.15 | 81.68 | 85.28 | Identifier | 72.08 | 69.76 | 72.52 | 74.60 | 74.76 | 77.18 | Multiple | 36.06 | 34.31 | 39.26 | 26.89 | 20.96 | 30.46 | Systematic | 82.33 | 81.51 | 84.73 | 82.10 | 81.49 | 84.44 | Trivial | 86.73 | 85.85 | 88.84 | 86.50 | 86.14 | 88.51 |
|
|