Research Article

Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning

Table 2

Extracted features.

FeatureExample

Frequency features:
Number of named entities2
Number of drugs1
Number of diseases1
Number of verbs between two NE1
Number of words between NEs4
Bag-of-WordPreliminary = 1; evidence = 1; suggests = 1; that = 1; interferons = 1; beta = 1; may = 1; also = 1; induce = 1; regression = 1; of = 1 metastatic = 1; renal = 1; cell = 1; carcinoma = 1;
Lexical features:
Sequence of words of the NEInterferons beta_ metastatic renal cell carcinoma
Sequence of words between every two NEsMay_also_induce_regression
Sequence of 3 words before each NEPreliminary_evidence_suggests; Also_induce_regression
Sequence of 3 words after each NEMay_also_induce; null
Morphologic features:
Sequence of lemmas of the words between every two NEsMay_also_induce_regression
Sequence of lemmas of the 3 words before each NEPreliminary_evidence_suggest; Also_induce_regression
Sequence of lemmas of the 3 words after each NEMay_also_induce; null
Syntactic features:
Sequence of POS of NENNS_NN_JJ_JJ_NN_NN
Sequence of POS of words between every two NEsMD_RB_VB_NN
Sequence of POS of 3 words before each NEJJ_NN_VBS; RB_VB_NN
Sequence of POS of 3 words after each NEMD_RB_VB; NULL
Verbs sequence among every two NEsInduce;
First verb preceding every NESuggest; induce
First verb after every NEInduce; null
Semantic features:
Semantic type sequenceTREAT_DIS