Research Article
ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition
Table 2
Details of BioCreative data set.
| Data set |
Number of abstracts |
Number of sentences | Number of NEs in each class | Total number of NEs | Systematic | Abbreviation | Family | Formula | Identifier | Multiple | Trivial |
| Train | 3500 | 30418 | 6656 | 4538 | 4090 | 4448 | 672 | 202 | 8832 | 29438 | Development | 3500 | 30445 | 6816 | 4521 | 4223 | 4137 | 639 | 188 | 8970 | 29494 | Test | 3000 | 8655 | 5666 | 4059 | 3622 | 3443 | 513 | 199 | 7808 | 25310 |
|
|