Research Article

Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records

Figure 2

Our ADR identification framework consists of the three main tasks. (1) In the relation generation, drug-event pairs (d, e) are extracted from a corpus together with their patterns () using named entity recognition (NER), sentence boundary detection (SBD), and parsing. (2) In the automatic data labeling, distant supervision assigns a relation label () to each drug-event pair (d, e) obtained from the relation generation with its pattern if such relation exists in knowledge base. The silver-standard data set is labeled data in the experiment. Here, two types of output data sets are a set of labeled data (), composed of (d, p, e, y) extracted from a corpus (EMR texts), where the labels () are defined for the drug-event pairs (d, e) in the knowledge base, and a set of unlabeled data (), composed of (d, p, e) extracted from a corpus, where the labels do not exist for the drug-event pairs (d, e) in the knowledge base. (3) In this relation classification, this work proposes three types of generative models with independent/dependent expectation-maximization (EM) model (iEM/dEM): (i) transductive learning with iEM (baseline), (ii) supervised learning with dEM, and (iii) transductive learning with dEM.