Research Article

Automated Training for Algorithms That Learn from Genomic Data

Table 2

Cardinalities of the negative interim training sets for the 17 apicomplexan species gathered by ApicoAP-CS.

Apicomplexan Species OrthoMCLaBLASTbConfirmedcAll CombineddConflicts RemovedeNon-SP Filteredf

B. bovis 144 136 8 161 159 33
B. microti 142 130 0 159 156 23
C. hominis 135 130 0 157 154 28
C. muris 143 137 0 163 160 34
C. parvum 130 129 10 164 161 33
E. tenella 400 175 8 443 438 169
N. caninum 254 220 15 288 283 81
P. berghei 222 212 28 260 256 101
P. chabaudi 238 223 2 258 253 108
P. cynomolgi 259 224 0 273 269 93
P. falciparum 284 173 156 443 439 138
P. knowlesi 236 227 6 258 254 91
P. vivax 261 227 13 281 277 103
P. yoelii 242 216 16 270 266 89
T. annulata 151 133 4 169 167 42
T. parva 186 128 4 204 202 71
T. gondii 194 198 131 333 327 92

ā€‰aCardinality of the set gathered by ortholog search using OrthoMCL.
bCardinality of the set gathered by ortholog search using the BLAST-based algorithm.
cCardinality of the set containing experimentally confirmed positive/negative proteins.
dCardinality of the set that is the union of the sets presented in columns 2, 3, and 4.
eCardinality of the union set when conflicts with the negative/positive set are removed.
fCardinality of the final training set after proteins without signal peptides have been removed.