Research Article

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

Table 2

Details of BioCreative data set.

Data set Number of abstracts Number of sentencesNumber of NEs in each classTotal number of NEs
SystematicAbbreviationFamilyFormulaIdentifierMultipleTrivial

Train3500304186656453840904448672202883229438
Development3500304456816452142234137639188897029494
Test300086555666405936223443513199780825310