Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
Table 1
A summary of the proteomes and gene sets.
Domain
Species
Gene number
Ave
Med
Max
Min
IDpep%
IDres%
Eukaryota
H. sapiens
20,193
561.0
417
34,350
16
45.2
49.3
D. melanogaster
13,700
537.2
396
22,949
11
44.3
49.0
S. cerevisiae
5917
494.1
405
4910
16
38.1
44.6
A. thaliana
27,407
405.2
348
5393
7
36.8
43.6
P. trichocarpa
41,434
385.0
317
5410
29
35.5
42.6
A. comosus
29,772
372.6
288
5407
31
39.5
45.4
O. sativa
48,788
376.1
290
4957
5
38.0
44.5
A. trichopoda
26,460
317.0
218
4990
29
37.5
43.9
C. reinhardtii
17,819
732.9
498
23,859
31
54.8
61.9
P. patens
32,400
351.9
250
5199
13
40.2
45.5
G. intestinalis
9667
353.8
147
8161
33
35.1
41.7
Monocercomonoides
16,780
784.6
393
14,902
49
52.7
60.1
Archaea
Lokiarchaeum
5348
268.4
224
3592
20
20.0
33.0
I. hospitalis
1434
278.3
240
1392
33
20.4
34.3
N. equitans
540
280.2
228
2197
45
7.0
30.6
Bacteria
E. coli
4140
316.9
282
2358
14
17.5
32.2
S. elongatus
2612
305.3
258
1807
29
20.8
34.3
Rickettsiales
1780
365.2
251
2243
31
7.7
32.8
Giruses
Mimivirus
979
356.7
289
2959
25
25.0
36.6
Pandoravirus
2541
259.2
178
2321
26
36.4
43.5
Gene sets
Viruses
237,463
251.8
154
8573
9
28.0
38.8
Plasmids
95,214
258.9
206
16,990
9
27.2
38.1
Mitochondria
88,405
286.1
261
2640
13
8.6
20.0
Plastids
80,807
280.0
156
5242
12
20.5
32.0
All proteinsf
811,600
325.7
225
34,350
5
32.2
39.8
Proteomes in the three domains of life; the giant DNA viruses (giruses) and collective protein sets are listed after the cellular species; Total gene numbers; Protein length statistics: Ave: average; Med: median; Max: maximal; Min: minimal protein lengths; Percentage of the intrinsically disordered proteins in the proteome or gene set; Average intrinsic disorder contents of all residues carried by the proteome or gene set; All proteins studied in the present work. The protein length statistics covers all proteins in a proteome or gene set; however, the proteins with unknown sequence(s) (X residues) are excluded in the intrinsic disorder calculations.