Research Article

Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records

Table 2

Features used by smoking history, sectionizer, and time attribute assigner classifiers.

Component ClassificationClassifierClassesList of features

Smoking historySentence levelNaïve BayesCurrent, past, and neverBag of words, POS tags

SectionizerSentence levelConditional random fieldsSection heading, section heading with text, and textFirst word uppercased, all words uppercased, all words lowercased, dictionary match, first word, second word, previous sentence features, next sentence features, full stop, and containing colon

Time attribute assignerPhrase levelNaïve BayesBefore DCT, during DCT, after DCT, and continuingIdentified risk factor spans, previous word, previous word POS tag, next word, next word POS tag, section information, and indicator attribute