Table 3: Summary of classification approaches: test instances (marked events) for each class type in test dataset. Precision, recall, and F1-score in percentage. Compared to NB + EM and CRF, Maximum Entropy based classifier had better average precision, but CRF has best recall and good precision, giving it best F-Measure of the three well-known classifiers.

Event type Test instances NB + EM MaxEnt CRF
Total: 942 P R F1 P R F1 P R F1

Phosphorylation 38 62 4250 97 73 83 80 83 81
Protein catabolism 17 60 47 53 97 73 83 85 8685
Gene expression 200 60 41 49 88 58 70 75 81 78
Localization 39 39 47 43 61 69 6567 79 72
Transcription 60 24 52 33 49 80 61 57 78 66
Binding 153 56 63 59 65 62 63 65 81 72
Regulation 90 47 69 55 52 67 58 62 73 67
Positive regulation 220 70 27 39 75 25 38 55 74 63
Negative regulation 125 42 46 4454 38 45 68 82 74

Average 51 48 4771 61 63 68 80 73