Research Article

Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts

Table 1

Basic information for each dataset.

DatasetsDomainAnnotatedNo. of textsNo. of sentencesNo. of entitiesNo. of wordsNo. of vocabularies

WikipediaChineseGeneralFalseā€”3,745,841ā€”337,063,33114,261
TCM-HNClinicalTrue29,636155,566318,33714,009,4943,008
COVID-19ClinicalTrue6,10529,663201,5671,726,6652,248
TCM-HBClinicalTrue18,555105,075247,2916,394,9022,778