Research Article

Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords

Table 2

Features obtained from parsing information.

Features Definitions/remarks Values Examples

Height of protein pair and keyword: three types The heights of the protein names constituting the protein pair and the keyword at the parse tree structure: these heights differ from word distances; features height_P1, height_P2, and height_K are defined for the heights of , , and , respectively Integer value In Figure 1, height_P1, height_P2, and height_K are 2, 5, and 3, respectively

Part-of-speech information of protein pair and keyword: three types The part-of-speech information of PATH (the path from the root) at the parse tree structure of the protein names constituting the protein pair and the keyword; it is possible to represent the syntax structure and train classifiers to learn pseudo grammar structure; features POS_P1, POS_P2, and POS_K are defined for the part-of-speech information of the PATH of the leaf representing , , and , respectively List of part-of-speech information of PATH In Figure 1, POS_P1, POS_P2, and POS_K are “NP, NN,” “VP, VP, PP, NP, NN,” and “VP, VP, VBN,” respectively