Review Article

Biomedical Relation Extraction Using Distant Supervision

Table 2

An overview of the relations targeted by each method with a mention of the resources used and the results obtained.

PaperRelation typeKnowledge baseCorporaNER resultsRE results

[12]Protein-residueProtein Data Bank (PDB) [25]PubMed abstractsEvaluated on 3 gold corpora only for amino acid/mutation entities: Nagel et al. F-measure = 93.28%/mutation finder: development (F-measure = 89.32%) and test corpora (F-measure:88.04%) LEAP-FS corpus: F-measure = 86.56%0.84 F-measure (silver corpus) 0.79 F-measure (gold corpus)

[13]Drug-drug protein-proteinIntAct database [26], KUPS database [27], DrugBank [28]The five corpora of Pyysalo et al. [29]. The corpus of Segura-Bedmar et al. [30]Not mentionedDrug-drug (DDI) F-score = 61.19 PPI F-score = 78.0 on LLL corpus

[14]Gene-brain regionsUMLS Semantic Network [31]10,000 randomly selected full-text articles from Elsevier Neuroscience corpusF1 = 0.8 (for 300 manually examined examples)F1-score = 0.468, recall = 0.459, precision = 0.477 (for 259 manually labelled sentence out of 30,000)

[15]Protein-locationUniProtKB (Swiss-Prot) [32]43,000 full-text articles from the Journal of Biological ChemistryNot mentionedF1 = 0.61, R = 0.49, P = 0.81 (sentence level) accuracy = 0.57 (RL instance level)

[16]microRNA-geneTransmiR database (nonhuman entries) [33]IBRel-miRNA corpusEvaluated on 3 corpora: Bagewadi corpus [34] (F = 0.919 miRNA/F = 0.677 gene), miRTex [35] (F = 0.941 miRNA/F = 0.795 genes), and TransmiR (F = 0.687 miRNA/F = 0.361 genes)Evaluated on 3 corpora: Bagewadi corpus (F = 0.532), miRTex (F = 0.383), and TransmiR (F = 0.413)

[17]Related symptoms, related diseases, related examination, complications, and related treatmentNot mentionedMedical websitesNot mentionedAccuracy = 91.87%, recall = 91.58%, F1-score = 0.8908

[18]Gene-drugGene Drug Knowledge Database (GDKD) [36]Biomedical literature from PubMed CentralNot mentionedAutomatic evaluation best average test accuracy in fivefold cross-validation (single sentence: 88, cross sentence: 87.5) manual evaluation (precision = 71 for single sentence and 61 for cross sentence)

[19]n-arity relations: Treats, ReducesRisk, Causes, Diagnoses474 seed facts from online medical portals uptodate.com, drugs.comEncyclopaedic articles and PubMed scientific publicationsNot mentionedTreats avg. precision: 0.86, ReducesRisk avg. P: 0.82, Causes avg. P: 0.80, and Diagnoses avg. P: 0.89

[20]Protein-protein, protein-locationIntAct database, UniProt databaseMedline, literature found in IntAct databaseNot mentionedPPI (PCNN F-score = 56.8 BiLSTM F-score = 50.4) PLOC (PCNN F-score = 54.5 BiLSTM F-score = 60.4)

[21]Chemical-diseaseComparative Toxicogenomics Database (CTD Database) [37]PubMed abstractsNot mentionedIntrasentence level: best F-score = 60.8; intersentence level: best F-score = 22.8
[22]Binary treatment relationUMLS database, SemMedDB [38]PubMed abstracts for which there exist both the therapeutic use and the therapy medical subject headings (MeSH) subheadingsNot mentionedPR-AUC: logistic regression:82.86 BiLSTM:81.18 BiLSTM-NLL:81.38

[23]Human disease-gene, tissue-gene, and protein-protein in different speciesGenetics Home Reference (GHR) [39], UniProtKB, KEGG maps [40], STRING [41]PubMed, full-text articles from PMC in BioC XML format [42]Not mentionedAdjusted area under the precision-recall curve (AUPRC): disease-gene: 0.86/tissue-gene: 0.19