Research Article

A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text

Table 2

Name annotation rules of core drugs.

NumberRule description

1Drug information should be recorded in the ANs, including the names of disease-treating or symptom-relieving drugs (e.g., TCM, WM, and biological agents). Drug name defined here describes one drug, one drug combination, or one medical product
2The modifiers indicating a change of drug use or a patient’s drug use duration should not be included in the annotated drug name
3Drug name entity is annotated by 1 phrase. For instance, the drug “nifedipine GITS” is usually annotated by two phrases: nifedipine and GITS, while here we annotate the whole drug name phrase as one drug name entity or namely the whole entity should be ascribed as one phrase
4For TCM, Chinese characters indicating drug forms, such as “丸” (pill), “粉” (powder), and “汤” (decoction), cannot be annotated as single characters, because they are usually placed as the last characters within certain TCM drug names, and thus, the drug names should be annotated as a whole. For example, in the TCM drug name “大青龙汤” (da qin long decoction), “da qin long” is the pinyin to Chinese characters “大青龙,” while Chinese character “汤” is a drug form meaning “decoction”
5The explicit negative modifiers around the drug names are not included in the annotated drug name entity
6When Chinese drug name and the corresponding English name coexist in one short description without other words between them, they are jointly annotated as one drug name entity
7When Chinese drug name and the corresponding English name coexist in one short description with simple symbols such as “/” or “-” between them, they are jointly annotated as one drug name entity
8We also have seen parallel construction or ellipsis construction in some drug names. If two drug names are connected by one conjunction, the two drug names should be annotated as two separate drug entities
9In some situations, certain words and punctuations in a drug name entity are ignored. Then, the following rules are used:
(a) Conjunctive words and “、” and “,” are ignored and excluded in drug name entities
(b) To reserve the functionality and readability of drug name entities, we should annotate these descriptions as a separation of two independent drug name entities
10If two or more valid drug names end with the same characters and are combined together, then, the last drug name with the ending characters is taken as one complete drug name. For instance, in a description of two drug names ofloxacin and vitamin C injection, vitamin C injection is recognized as one complete drug name entity
11Drug names usually contain figures, letters, and other symbols. Since these symbols represent drug-related information (e.g., (), <>), they are included in one drug name entity
12When the TCM name and the description of the producing area coexist in the drug name, the information of the producing area is ignored. For instance, in Chuan Bei Mu (Zhejiang), Zhejiang is ignored
13Specification may follow a drug name that does not belong to a drug name entity and may not need separate annotation. For example, in the drug name Cold Clear Capsule (a capsule with 24 mg of paracetamol), the specification is in brackets
14Maximum annotation length of drug name entities should be set and followed, except when such a limitation of annotation length destroys the validity of the grammar structure. Especially, when modifiers of a drug name contain special information about a brand and pattern and form an agglutinate structure within the drug name, then, these modifiers should be included in the drug name entity. For example, in the drug name “苗泰小儿柴桂退烧颗粒” (pinyin translation is “miao tai xiao er chai gui tui sao ke li”), “苗泰” (pinyin: “miao tai”) is a drug brand and should not be excluded