Research Article

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

Table 3

Details of DDI corpus.

Data setNumber of documentsNumber of sentencesNumber of NEs in each classTotal number of Named Entities
DrugGroupBrandDrug_n

Train
 DrugBank572567581973206142310312929
 Medline14213011228193144011836
Test
 DrugBank5414518065535303
 Medline58520171906115382