Abstract

The extremely radioresistant bacteria of the genus Deinococcus and the extremely thermophilic bacteria of the genus Thermus belong to a common taxonomic group. Considering the distinct living environments of Deinococcus and Thermus, different genes would have been acquired through horizontal gene transfer after their divergence from a common ancestor. Their guanine-cytosine (GC) contents are similar; however, we hypothesized that their genomic signatures would be different. Our findings indicated that the genomes of Deinococcus radiodurans and Thermus thermophilus have different tetranucleotide frequencies. This analysis showed that the genome signature of D. radiodurans is most similar to that of Pseudomonas aeruginosa, whereas the genome signature of T. thermophilus is most similar to that of Thermanaerovibrio acidaminovorans. This difference in genome signatures may be related to the different evolutionary backgrounds of the 2 genera after their divergence from a common ancestor.

1. Introduction

In the present bacterial taxonomic system, the extremely radioresistant bacteria of the genus Deinococcus and the extremely thermophilic bacteria of the genus Thermus belong to a common lineage with remarkably different characteristics [1, 2]. Comparative genomic analyses have shown that after their divergence from a common ancestor, Deinococcus species seem to have acquired numerous genes from various other bacteria to survive different kinds of environmental stresses, whereas Thermus species have acquired genes from thermophilic archaea and bacteria to adapt to high-temperature environments [3]. For example, the aspartate kinase gene of Deinococcus radiodurans has a different evolutionary history from that of Thermus thermophilus [4]. In addition, D. radiodurans has several unique protein families [5] and genomic characters [6], and there is no genome-wide synteny between D. radiodurans and T. thermophilus [7]. However, phylogenetic analyses based on both orthologous protein sequence comparison and gene content comparison have shown that the genomes of Deinococcus and Thermus are most closely related with each other [3, 8]. The trinucleotide usage correlations have been used to predict the functional similarity between two RecA orthologs of bacteria including D. radiodurans and T. thermophilus [9].

If the genes acquired through horizontal gene transfers are different between Deinococcus and Thermus, then the genomic base composition (GC content) and/or genome signature can be hypothesized to also be different between these 2 genera. However, the GC content of D. radiodurans (67%) is similar to that of T. thermophilus (69.4%). The genome signature, on the other hand, is a powerful basis for comparing different bacterial genomes [1119].

Phylogenetic analyses based on genome signature comparison have been developed, and these analyses are useful for metagenomics studies [20]. It was reported that comparative study using the frequency of tetranucleotides is a powerful tool for the bacterial genome comparison [21]. In this study, we compared the relative frequencies of tetranucleotides in 89 bacterial genome sequences and determined the phylogenetic positions of D. radiodurans and T. thermophilus.

2. Methods

2.1. Construction of Phylogenetic Relationships Based on the Relative Frequencies of Tetranucleotides in 89 Genome Sequences

We compared the relative frequencies of tetranucleotides in the genome sequences. The frequencies of the 89 bacteria were obtained from OligoWeb (oligonucleotide frequency search, http://insilico.ehu.es/oligoweb/). The 89 bacterial species are part of a list that which covers a wide range of bacterial species published in a previous report [8]. Each frequency vector consisted of 256 elements. The Euclidean distance between 2 vectors was calculated using the software package R (language and environment for statistical computing, http://www.R-project.org). On the basis of the distance matrix, a neighbor-joining tree was constructed using the MEGA software [10].

2.2. Ranking Based on Similarities between the Relative Frequencies of Tetranucleotides according to Correspondence Analysis

Correspondence analysis [22], which is a multivariate analysis method for profile data, was performed against the relative frequencies of tetranucleotides in 89 genomes. Correspondence analysis summarizes an originally high-dimensional data matrix (rows (tetranucleotides) and columns (genomes)) into a low-dimensional projection (space) [23, 24]. Scores (coordinates) in the low-dimensional space are given to each genome. The distance between plots (genomes) in a low-dimensional space theoretically depends on the degree of similarity in the relative frequencies of tetranucleotides: a short distance means similar relative frequencies of tetranucleotides between genomes, whereas a long distance means different relative frequencies. Thus, distance can be used as an index for similarity among genomes in the relative frequencies of tetranucleotides. Distances between all genome pairs were calculated, and then a ranking for distances was obtained.

3. Results and Discussion

In the neighbor-joining tree (Figure 1), D. radiodurans is located in the high-GC-content cluster, whereas T. thermophilus is grouped with Thermanaerovibrio acidaminovorans and their group is located away from the high-GC-content cluster. The neighbor-joining tree (Figure 1) was greatly influenced by the genomic GC content bias; most of the well-defined major taxonomic groups did not form a monophyletic lineage. This result indicates that each constituent of the well-defined major group has diversified by changing its genome signature during evolution. It is consistent with a previous paper indicating that microorganisms with a similar GC content have similar genome signature patterns [25].

Phylogenetic analysis according to genome signature comparison is not based on multiple alignment data. Thus, bootstrap analysis cannot be performed. In this paper, we estimated the similarity between 2 different tetranucleotide frequencies by using correspondence analysis. The correspondence analysis showed that the genome signature of D. radiodurans is most similar to that of Pseudomonas aeruginosa (Table 1), whereas the genome signature of T. thermophilus is most similar to that of Th. acidaminovorans (Table 2). Although the D. radiodurans genome signature has similarity to 18 bacterial species within the distance 0.5, the T. thermophilus genome signature has similarity only to Th. acidaminovorans within the same distance (Table 2). These results indicate that T. thermophilus has a different genome signature from those of bacteria included in the high-GC-content cluster (Figure 1).

Although Pearson’s correlation coefficient between the tetranucleotide frequencies of genomes of D. radiodurans and T. thermophilus is 0.630 (Figure 2), that between the tetranucleotide frequencies of genomes of D. radiodurans and Pseudomonas aeruginosa is 0.935 (Figure 3) and that between the tetranucleotide frequencies of genomes of Th. acidaminovorans and T. thermophilus is 0.914 (Figure 4). These results support the results of the neighbor-joining and correspondence analyses.

The frequency of horizontal gene transfer between different bacteria may be associated with genome signature similarity. However, the tree topology based on genome signature (Figure 1) is different from that based on gene content [8]. This is caused by, among others, an amelioration of the horizontally transferred genes [26]. Our findings strongly support the previous report that Deinococcus has acquired genes from various other bacteria to survive different kinds of environmental stresses, whereas Thermus has acquired genes from thermophilic bacteria to adapt to high-temperature environments [3].

Acknowledgment

The authors thank Professor Teruhiko Beppu for his valuable comments and encouragement.