Research Article

A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text

Table 4

List of various features for the drug name recognizer.

Feature setFeaturesDescription

F1-1CWS = 1:


The 1-gram, 2-gram, and 3-gram of the character text at CWS = 1
F1-2CWS = 2:


The 1-gram, 2-gram, and 3-gram of the character text at CWS = 2
F1-3CWS = 3:


The 1-gram, 2-gram, and 3-gram of the character text at CWS = 3
F1-4CWS = 1:


The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 1
F1-5CWS = 2:


The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 2
F1-6CWS = 3:


The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 3
F2-1InDictTCMAre the current character and the surrounding characters contained in the TCM dictionary?
F2-2InDictTCMPinyinAre the pinyins corresponding to the current character and the surrounding characters contained in the TCM dictionary?
F2-3InDictWMAre the current character and the surrounding characters contained in the WM dictionary?
F2-4InDictWMPinyinAre the pinyins corresponding to the current character and the surrounding characters contained in the WM dictionary?
F3-1CurCx-HasTCMDoseUnitDo the current character and subsequent characters contain the TCM dosage unit at CWS = 3?
F3-2CurCx-HasWMDoseUnitDo the current character and subsequent characters contain the WM dosage unit at CWS = 3?
F3-3PreCx-HasTCMDoseUnitDo the characters before the current character contain the TCM dosage unit at CWS = 3?
F3-4PreCx-HasWMDoseUnitDo the characters before the current character contain the WM dosage unit at CWS = 3?
F3-5CurCx-HasTCMRouteDo the current character and subsequent characters contain the TCM usage term at CWS = 3?
F3-6CurCx-HasWMRouteDo the current character and subsequent characters contain the WM usage term at CWS = 3?
F3-7PreCx-HasTCMRouteDo the characters before the current character contain the TCM usage term at CWS = 3?
F3-6PreCx-HasWMRouteDo the characters before the current character contain the WM usage term at CWS = 3?
F3-9CurCx-HasTCMFormUnitDo the current character and subsequent characters contain the TCM drug form unit at CWS = 3?
F3-10CurCx-HasWMFormUnitDo the current character and subsequent characters contain the WM drug form unit at CWS = 3?
F3-11PreCx-HasTCMFormUnitDo the characters before the current character contain the TCM drug form unit at CWS = 3?
F3-12PreCx-HasWMFormUnitDo the characters before the current character contain the WM drug form unit at CWS = 3?
F3-13CurCx-HasTCMFrequencyDo the current character and subsequent characters contain the TCM frequency description at CWS = 3?
F3-14CurCx-HasWMFrequencyDo the current character and subsequent characters contain the WM frequency description at CWS = 3?
F3-15PreCx-HasTCMFrequencyDo the characters before the current character contain the TCM frequency description at CWS = 3?
F3-16PreCx-HasWMFrequencyDo the characters before the current character contain the WM frequency description at CWS = 3?
F4-1HasNum9Do the current character and the surrounding characters include the figure “9”?
F4-2HasToken@Do the current character and the surrounding characters include the symbol “@”?
F4-3HasEnglishAlphabetsDo the current character and the surrounding characters include English letters?
F4-4HasTimeDo the current character and the surrounding characters contain time description such as hour, week, date, or year?
F5InListSectionNameDo the name of AN section involving the current character and the surrounding characters appear in the predefined section list?
F6Classx = [BIO]These three types of features indicate the type labels of the 3 characters before the current character