Research Article

Automated Training for Algorithms That Learn from Genomic Data

Table 1

Cardinalities of the positive interim training sets for the 17 apicomplexan species gathered by ApicoAP-CS.

Apicomplexan species Ortho-MCLaBLASTbConfirmedcAll combineddConflicts removedeNon-SP filteredf

B. bovis 46 45 4 61 59 18
B. microti 51 50 0 61 58 23
C. hominis 17 24 0 28 25 1
C. muris 19 27 0 32 29 0
C. parvum 17 24 0 29 26 2
E. tenella 82 61 1 89 84 30
N. caninum 78 68 0 82 77 21
P. berghei 72 73 0 77 73 49
P. chabaudi 72 73 0 77 72 51
P. cynomolgi 70 72 0 77 73 31
P. falciparum 45 60 40 89 85 52
P. knowlesi 72 72 0 77 73 49
P. vivax 69 72 0 75 71 51
P. yoelii 70 68 3 77 73 41
T. annulata 45 47 0 56 54 23
T. parva 49 42 0 59 57 25
T. gondii 53 59 45 102 96 42

ā€‰aCardinality of the set gathered by ortholog search using OrthoMCL.
bCardinality of the set gathered by ortholog search using the BLAST-based algorithm.
cCardinality of the set containing experimentally confirmed positive/negative proteins.
dCardinality of the set that is the union of the sets presented in column 2, 3 and 4.
eCardinality of the union set when conflicts with the negative/positive set is removed.
fCardinality of the final training set after proteins without signal peptides have been removed.