Research Article
A Novel Approach for Protein-Named Entity Recognition and Protein-Protein Interaction Extraction
Table 5
Gain Ratio and ranking of the features and distribution of entities by features.
| Feature | Gain Ratio | Ranking | Pro_NE (%) | Not Pro_NE (%) | Pro_NE/Not_Pro_NE |
| Sent_Uppercase | 0.08827 | 1 | 24.3 | 15.7 | 1.55 | Word_Num | 0.0839 | 2 | 10.5 | 17.8 | 0.59 | Word_Symbol | 0.05184 | 3 | 37.3 | 16.7 | 2.23 | Suffix_Letter | 0.04518 | 4 | — | — | — | Word_Uppercase | 0.04284 | 5 | 24.5 | 15.5 | 1.58 | Suffix_Word | 0.0388 | 6 | — | — | — | Prefix_Word | 0.03851 | 7 | — | — | — | Prefix_Letter | 0.03254 | 8 | — | — | — | Length | 0.02472 | 9 | — | — | — | POS | 0.02108 | 10 | — | — | — | Word_Alphabet | 0.01411 | 11 | 22.5 | 5.5 | 4.09 | Biology_Word | 0.00833 | 12 | 21.8 | 7.9 | 2.76 |
|
|