Research Article

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

Table 8

Class based performance (-score in %) for BioCreative corpus using various tokenizers.

AlgorithmEntity typeDevelopment setTest set
ChemSpottmVarChemTokChemSpottmVarChemTok

CRFAbbreviation68.1466.5868.6867.2065.4268.86
Family69.2267.5972.6771.9470.6074.91
Formula76.5769.7080.1275.2969.8179.37
Identifier63.0359.4574.6063.8861.5574.28
Multiple32.5026.8641.3532.7730.5035.12
Systematic79.4178.0982.8579.9578.3383.11
Trivial85.6284.1188.3685.5283.6987.66

SVMAbbreviation72.5972.1274.5072.4271.6573.75
Family69.8269.6971.9971.8171.5774.31
Formula82.66781.6184.6882.1581.6885.28
Identifier72.0869.7672.5274.6074.7677.18
Multiple36.0634.3139.2626.8920.9630.46
Systematic82.3381.5184.7382.1081.4984.44
Trivial86.7385.8588.8486.5086.1488.51