Research Article

Automated Training for Algorithms That Learn from Genomic Data

Table 2

Cardinalities of the negative interim training sets for the 17 apicomplexan species gathered by ApicoAP-CS.

Apicomplexan Species OrthoMCLaBLASTbConfirmedcAll CombineddConflicts RemovedeNon-SP Filteredf

B. bovis 144 136 8 161 159 33
B. microti 142 130 0 159 156 23
C. hominis 135 130 0 157 154 28
C. muris 143 137 0 163 160 34
C. parvum 130 129 10 164 161 33
E. tenella 400 175 8 443 438 169
N. caninum 254 220 15 288 283 81
P. berghei 222 212 28 260 256 101
P. chabaudi 238 223 2 258 253 108
P. cynomolgi 259 224 0 273 269 93
P. falciparum 284 173 156 443 439 138
P. knowlesi 236 227 6 258 254 91
P. vivax 261 227 13 281 277 103
P. yoelii 242 216 16 270 266 89
T. annulata 151 133 4 169 167 42
T. parva 186 128 4 204 202 71
T. gondii 194 198 131 333 327 92

 aCardinality of the set gathered by ortholog search using OrthoMCL.
bCardinality of the set gathered by ortholog search using the BLAST-based algorithm.
cCardinality of the set containing experimentally confirmed positive/negative proteins.
dCardinality of the set that is the union of the sets presented in columns 2, 3, and 4.
eCardinality of the union set when conflicts with the negative/positive set are removed.
fCardinality of the final training set after proteins without signal peptides have been removed.