(1) N-grams (merging the attributes of 2 to 4 consecutive tokens); (2) individual component features for each token and edge in a path; (3) semantic node features (the attributes of the two terminal event/entity nodes of the potential event argument edge); (4) frequency features (the length of the shortest path and the number of named entities and event nodes, per type, in the sentence)
(1) Words and POS in a window around the trigger; (2) distances between the trigger and the two nearest annotated proteins (left and right) and the theme candidate
(1) Three stemmed consecutive words from the subsentence spanning the event; (2) lexical and syntactic information of triggers; (3) size of the subgraph; (4) bag of words; (5) length of the subsentence; (6) extra features for regulation events; (7) vertex walks which consist of two vertices and their connecting edge