Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords
Table 2
Features obtained from parsing information.
Features
Definitions/remarks
Values
Examples
Height of protein pair and keyword: three types
The heights of the protein names constituting the protein pair and the keyword at the parse tree structure: these heights differ from word distances; features height_P1, height_P2, and height_K are defined for the heights of , , and , respectively
Integer value
In Figure 1, height_P1, height_P2, and height_K are 2, 5, and 3, respectively
Part-of-speech information of protein pair and keyword: three types
The part-of-speech information of PATH (the path from the root) at the parse tree structure of the protein names constituting the protein pair and the keyword; it is possible to represent the syntax structure and train classifiers to learn pseudo grammar structure; features POS_P1, POS_P2, and POS_K are defined for the part-of-speech information of the PATH of the leaf representing , , and , respectively
List of part-of-speech information of PATH
In Figure 1, POS_P1, POS_P2, and POS_K are “NP, NN,” “VP, VP, PP, NP, NN,” and “VP, VP, VBN,” respectively