Abstract

The complete 15,599-bp mitogenome of Acrida cinerea was determined and compared with that of the other 20 orthopterans. It displays characteristic gene content, genome organization, nucleotide composition, and codon usage found in other Caelifera mitogenomes. Comparison of 21 orthopteran sequences revealed that the tRNAs encoded by the H-strand appear more conserved than those by the L-stand. All tRNAs form the typical clover-leaf structure except trnS (agn), and most of the size variation among tRNAs stemmed from the length variation in the arm and loop of TΨC and the loop of DHU. The derived secondary structure models of the rrnS and rrnL from 21 orthoptera species closely resemble those from other insects on CRW except a considerably enlarged loop of helix 1399 of rrnS in Caelifera, which is a potentially autapomorphy of Caelifera. In the A+T-rich region, tandem repeats are not only conserved in the closely related mitogenome but also share some conserved motifs in the same subfamily. A stem-loop structure, 16 bp or longer, is likely to be involved in replication initiation in Caelifera and Grylloidea. A long T-stretch (>17 bp) with conserved stem-loop structure next to rrnS on the H-strand, bounded by a purine at either end, exists in the three species from Tettigoniidae.

1. Introduction

Mitochondrial genomes exhibit several unique features, including strict orthology, maternal inheritance, lack of recombination, and rapid evolutionary rate. Due to key technological advances in sequencing and the accumulation of universal primers, mitochondria genes have been routinely used in phylogenetic studies as molecular markers [1]. In insect, the mitogenome is a double-stranded circular DNA molecule, usually composed of 13 protein coding genes (cox1-3, cob, nad1-6, nad4L, atp6, and atp8), 22 transfer RNA genes (trnX, where X refers to the corresponding amino acid), and 2 ribosomal RNA genes (rrnS and rrnL, respectively). In addition, an embedded large A+T-rich noncoding region may contain signals for control of replication and transcription. In certain metazoans mtDNA, all genes are transcribed from one strand, whereas in others both strands are used. Except for tRNA encoding genes, the gene order of entire mitochondrial genomes appears to be highly conserved in insects [2, 3].

For phylogenetic reconstruction, the entire mitogenome sequences contain more information than simply the collection of individual gene sequences. Examination of the mitogenomes may reveal important genome-level characteristics, such as length variation, base compositional bias, codon usage, gene rearrangement, RNA secondary structures, and modes of control of replication and transcription [4]. Gene rearrangements have become a very powerful means for inferring ancient evolutionary relationships, since rearrangements appear to be unique, generally rare events that are unlikely to arise independently in separate evolutionary lineages. Rearrangements have been found in over a third of the insect orders and in those orders where multiple representatives have been examined the phylogenetic signal in rearrangements is often very strong. Nevertheless, Mitogenome rearrangements have not lived up to early promise as useful phylogenetic markers for the resolution of interordinal relationship. The majority of insects have the same plesiomorphic gene arrangement that is shared by the Pancrustacea [2, 5, 6].

As the secondary structure of RNA (rRNA) molecules is considerably conserved across distantly related taxa, the structural information helps to refine the alignment of rRNA sequences more accurately in phylogenetic analyses [1, 711]. Although the secondary structure models have proliferated over the past decades in conjunction with the increasing number of molecular phylogenetic studies based on rRNA sequences, details of mitochondrial rRNA structure are still usefully investigated because they may differ even among closely related taxon in peripheral regions [8]. Likewise, advances in RNA substitution models have underlined the need for reliable secondary structure models for individual taxonomic groups [12].

The control region is called the A+T-rich region in insect, which is the major noncoding region in the mitogenome of insect [1]. It is heavily biased to A+T nucleotides and seems to evolve under a strong directional mutation pressure. Among insects, this region is variable in both size and nucleotide sequence and may contain tandem repetition which is often associated with heteroplasmy. In contrast, the nucleotide substitution rate in this region is likely to be much reduced due to high A+T content and directional mutation pressure [13]. Some structural elements, which have been proposed to be involved in the control of replication and transcription, have been observed to be highly conserved between phylogenetically very distant insect taxa. These observations have implications for the use of this region as a genetic marker in evolutionary studies [1315]. Therefore, comparison of mitogenomes at various taxonomic levels may result in significant insights into the evolution of both organisms and genomes.

Orthoptera is a group of large and easily recognizable insects which includes grasshoppers, locusts, ground hoppers, crickets, bush-crickets, and mole-crickets as well as some lesser known groups. It is divided into two suborders: Caelifera and Ensifera, with ~20,000 known species distributed around the world. Most grasshoppers are herbivorous, often regarded as agricultural pests. Acrida cinerea, commonly known as the Chinese grasshopper, belongs to the subfamily Acridinae in Acrididae. The genus Acrida comprises approximately 40 species, occurred in Africa, Europe, Asia, and Australia. In China, 8 Acrida species are found and A. cinerea is the most widely distributed [16]. The grasshoppers of the genus Acrida are omnivorous insects, which are well known to damage sorghum, wheat, rice, cotton, weed, sweet potato, sugar cane, Chinese cabbage, or other crops.

51 sequence entries from this subfamily have been listed in the GenBank and most of them are partial mtDNA sequences of Acrida. Fenn et al. [17] presented the complete mitogenomes of Acrida willemsei and other four orthopteran species. The paper reconstructed a preliminary phylogeny of Orthoptera as a vehicle to examine the phylogenetic utility of mitogenome data in resolving deep relationships within the order. They also explored various methods of analyzing mitogenome data in a phylogenetic framework, by testing the effects of different optimality criteria, data partitioning strategies, and data transformation.

Here, the complete mitogenome of A. cinerea (Acrididae: Orthoptera) was reported with emphasized common structure elements and variations of RNA molecules and A+T-rich region based on the comparative sequence analyses with other 20 orthopterans. Hopefully these efforts would be helpful to understand the evolution characterization of mitogenome structure of orthopteran and provide basic structural information for RNA sequence alignment for evolution and phylogenetic studies in future.

2. Materials and Methods

2.1. Sampling

A. cinerea specimens were collected from Taibai Mountain at Xi’an, Shaanxi, China. All specimens were preserved in 100% ethanol and stored at −4°C.

2.2. DNA Extraction, PCR, and Sequencing

Total genomic DNA was isolated from a female adult A. cinerea by phenol/chloroform method and was diluted to 50 ng/μl in double-distilled water and used as template for long and accurate polymerase chain reaction (LA-PCR).

Two pairs of La-PCR primers [18] were used to amplify the complete mitogenome of A. cinerea into two overlapping fragments, cox1-cob (~9.5 bp) and cob-cox2 (~6 kb), as shown in Figure 1. La PCR amplifications were performed using Bio-Rad MyCycle Thermal Cycle (Bio-Rad, Hercules, USA) with 150 ng of genomic DNA, 2.5 μL of 10 × LA PCR Buffer II (TaKaRa Bio Inc.), 5.0 mmol/L dNTP (2.5 mmol/L each dNTP), 62.5 mmol/L MgCl2 (25 mmol/L), 25 μmol/L each primer (10 μmol/L), 1.5 units of LA Taq polymerase (TaKaRa), and sterile distilled H2O to make up to 25 μL reaction volume. The cycling protocol consisted of an initial denaturation step at 94°C for 2 min, followed by 40 cycles of denaturation at 94°C for 10 s, annealing at 45°C for 30 s, and elongation at 68°C for 8 min during the first 20 cycles and then an additional 20 s elongation per cycle during the last 20 cycles. The final elongation step was at 68°C for 7 min. LA-PCR products were purified with DNA Gel Purification Kit (U-Gene) after separation by electrophoresis in a 1.0% agarose gel.

Sub-PCR primers were designed based on the comparison of twelve hemimetabolous insect sequences recorded in GenBank. The amplifications were performed with 50 ng of La PCR products, 2.5 μL of PCR Buffer (TaKaRa), 3.0 mmol/L dNTPs (2.5 mmol/L each dNTP), 62.5 mmol/L MgCl2, 15–50 μmol/L each primer, 1.5 units of TaKaRa Taq polymerase (TaKaRa), and sterile distilled H2O up to 25 μL reaction volume. The cycling protocol consisted of an initial denaturation step at 94°C for 2 min, followed by 25–30 cycles of denaturation at 94°C for 10 s, 40–50°C annealing for 30 s, and 72°C elongation for 1-2 min. The final elongation step was at 72°C for 7 min. The Sub-PCR products were purified by DNA Gel Purification Kit (U-Gene).

The Sub-PCR fragments were sequenced directly or cloned into TaKaRa pMD 18-T Vector (TaKaRa). All products were sequenced in both directions with the ABI PRISM 3100-Avant Genetic Analyzer with the sub-PCR primers and two vector-specific primers.

2.3. Data Analysis

We used the Staden package [19] for sequence assembly and annotation. Each gene was identified by sequence comparison with the mitochondrial sequence of Locusta migratoria migratorioides (X80245). For mitogenome comparative analysis, we downloaded 20 additional complete Orthoptera mitogenomes sequences from GenBank (Table 1). Homologous sequences for each gene were initially aligned using Clustal X [20], and further analyzed by MEGA version 4.0 [21].

The initial alignments of tRNA and rRNA genes were manually corrected for obviously misaligned positions in BioEdit 7.0.0 [34]. To infer secondary structures, we used a commonly accepted comparative approach [35, 36]. Briefly, we defined a compensatory change as two substitutions occurring sequentially that maintained base pairing in a given position of a helix. The observation that two or more Watson-Crick (or G • U) interactions at the same location in a putative helix indicated selection to maintain base pairing and thus supported the helical model [7]. Evidence from consistent and compensatory substitutions (CCSs) gave more concrete measurement of the length of tRNA arms. We used the secondary structure model of the Drosophila melanogaster mitochondrial rrnL and the Chorthippus parallelus and Drosophila virilis mitochondrial rrnS molecules [35] to search for conserved sequence motifs that can be associated with conserved structural elements. The initial screening for conserved structural sequence motifs facilitated the subsequent analysis of secondary structural elements in more variable parts of the molecule. By searching for CCSs, we established the most likely secondary structures for the more variable portions of the rRNA molecules. Additionally, the inferred secondary structures were validated by using the folding algorithm in the software RNAalifold [37]. The default settings were used to predict consensus structures in RNAalifold. Except for the standard Watson-Crick base pairs and noncanonical G • U interactions, noncanonical base pairings proposed in other models were all observed in our study. The conventional numbering system established in the CRW Site [35] was used if a potential homology could be established by sequence similarity and/or structural position. In comparison, consecutive numbering was used when structural homology was ambiguous. Secondary structures were drawn using the software RnaViz 2.0 [38]. Conserved stem-loop structure of A+T-rich region in some species of Orthoptera also establishes by CCSs.

The complete mitochondrial genome sequence of the A. cinerea was deposited in GenBank with the accession number GU344100.

3. Results and Discussion

3.1. Genome Organization and Composition

The length and the average AT content of the complete mitochondrial sequence of A. cinerea is 15, 599 bp and 76.07%, respectively, well within the range of Orthoptera (Table 1). It displays a typical gene composition found in insect mitogenomes: 13 PCGs, 22 tRNA genes, 2 rRNA genes and an A+T-rich region. Besides the A+T-rich region, 17 noncoding regions are present in A. cinerea mitogenome, comprised of a total of 80 nucleotides. Overlaps ranging from 7 to 8 bp span over 4 regions (Table 2).

The orientation and gene order of the A. cinerea mitogenome (Figure 1) are identical to that of L. migratoria [24], exhibiting a translocation from the ancestral trnK/trnD to the derived trnD/trnK. Previously, this translocation was proposed and subsequently confirmed as a synapomorphy for Caelifera [14, 17, 18, 2326, 2833]. Furthermore, the duplicated trnL (uur) initially identified in T. neglectus [17] may serve as a potential molecular synapomorphy characteristic of a subgroup within Rhaphidophoridae. The translocation of trnN-trnE-trnS to trnE-trnS-trnN in T. emma has been reported [30], and appeared to be one of the most common changes in Drosophila as the result of sequence inversion of these tRNA clusters [39]. Future research will determine whether this rearrangement is a potential autapomorphy of this cricket or occurs at higher taxonomic level.

The highest AT content was observed in the A+T-rich region and the third codon position which are both under the lower selection pressure. As the expectation, the first and second codon positions have the less A+T base position bias than other mitogenome regions. Although the A+T-rich region is hypervariable, it is not necessarily the most variable region in the genome in terms of nucleotide substitution [13, 40]. In this paper, the A+T content of the A+T-rich region is always lower than that of the third codon position of PCGs (Table 1 and Figure 2). The concentrations of adenine and thymine of rrnL molecular are higher than that of rrnS, PCGs and the whole genome slightly. The curves that are representatives of PCGs and whole genome are very close. In Orthoptera, the A+T contents of ensiferans are lower than those of caeliferan but have higher difference among the species, especially in the regions which have high A+T content. Nevertheless, tRNA and the second position of PCGs have the relative constant A+T concentration in orthopterans, indicating that they are structurally or functionally more constrained.

3.2. Protein Coding Genes and Codon Usage

A typical ATN start codon was observed in eleven of the A. cinerea PCGs (Table 1). We assigned Ala (GCU) and Lys (AAA) to the nad5 and cox1 gene as start codon, respectively. Conventional termination codons (TAA and TAG) were observed in most of the putative protein sequences except the genes of cox2, nad2, and nad5 with incomplete termination codon T or TA- tRNA (Table 1).

Excluding the termination codons, the 13 PCGs in the A. cinerea mitogenome comprise of 3721 codons in total. The codon usage and the relative synonymous codon usage (RSCU) values are summarized in Table 3. The most frequent amino acids in the PCGs of A. cinerea are leucine (13.52%), isoleucine (10.70%), serine (9.87%), and phenylalanine (9.50%).

3.3. Transfer RNA and Ribosomal RNA Genes
3.3.1. tRNA Genes

The lengths of A. cinerea 22 tRNA genes range from 64 bp to 71 bp. The predicted secondary structures of tRNAs are shown in Figure 4. Most of the size variation among tRNAs stemmed from the length variation in the arm and loop of TΨC and the loop of DHU.

All tRNAs from 21 orthopterans have the typical clover leaf structure except for trnS (agn) [22, 25, 26, 2833]. The percent of the conservation sites of each tRNA, coding strand, the average A+T content of each tRNA, and average percent of codon usage were calculated for 21 mitogenomes of Orthoptera and are presented in Figure 3. The tRNAs encoded by the H-strand generally contain more conservation sites than those encoded by the L-strand. The conservation of tRNA genes was not associated with the frequency of codon usage and A+T content.

All tRNAs genes contain a 7-bp amino acid acceptor (AA) stem, where most nucleotide substitutions are compensatory. However, noncanonical interactions likely contribute to the full stem structure especially at the fifth or sixth couplet of certain tRNAs. For example, U · U or C · C pairs were found at the sixth couplet of trnQ in most Caelifera species. Likewise, in Caelifera, noncanonical A · G and A · A pairs were observed in trnW and trnD at the fifth couplet. Furthermore, U · U pairs are located at the sixth couplet in trnS (ucn) of Ensifera, and U · U or C · C pairs at the sixth couplet in trnA of orthopteran. Acrida sequences share a cytosine insertion after the fifth couplet, potentially as a molecular synapomorphy for this genus. Primary sequences of this helix are highly conserved in trnM and trnT.

The anticodon (AC) stem (5 bp) and the loop (7 bp) are both conserved in all tRNAs genes except for trnG of T. emma, which contains a distinct loop and two A · G pairs at the second and third couplets. Noncanonical interactions are also present in the AC stem, especially at the first couplet, including trnM, trnW, trnK, trnR, and trnL (cun). There is a conserved uracil before the anticodon in the AC loop.

Except for trnS (agn), the length of DHU is 3- or 4-bp as established by CCSs, and relatively consistent for each tRNA. Primary sequences of the DHU stem of trnI, trnM, trnW, trnD, trnE, trnT are conserved in the referenced taxa. The loop of DHU varies among the tRNAs of orthopterans except in trnQ (5 bp) and trnA (4 bp). The second trnL (uur) copy of T. neglectus [17] differs from others in the primary sequence of the DHU stem and loop. In addition, L. migratoria and O. chinensis have an insertion after the second couplet of trnH.

The lengths of the TΨC arm range from 3-bp to 6-bp and the loop also varies among the tRNAs. Among the 22 tRNAs, 14 tRNAs contain a variable (V) loop of constant length, most commonly 4 bp.

Except trnS (agn), the spacing nucleotides between the AA and DHU stems are predominantly nucleotides “UR”. Only one nucleotide separates the DHU and AC stems, except for trnG of G. orientalis, and trnH of Caelifera. T. emma has an insertion between the TΨC and AA stems of trnG as well as trnL (cun) of P. albonema, whereas there is no interval between these two stems in other tRNAs.

3.3.2. rRNA Genes

We derived a secondary structure model of the rrnS and rrnL from 21 Orthoptera taxa using a comparative approach. The derived secondary structures closely resemble those from other insects on CRW, thus confirming the majority of previously proposed base pair interactions in the rRNA molecules.

The secondary structure of the A. cinerea rrnS is presented in Figure 5(a) as a representative of 21 orthopterans. It consists of 782 nucleotides and 28 helices. Similar to the secondary structure of small ribosomal RNA subunits in prokaryotes, the secondary structure of insect rrnS is subdivided into four principal domains (labeled I, II, III, and IV) with reduction of certain helices [8]. Domains I and II are less sequenced due to the use of variable and less universal primers. Domains III and IV are the most conserved regions of rrnS, routinely used in insect systematic studies as molecular markers.

Domain I contains 9 helices. The primary sequences of helix 17 and the distal part of helix 511 are conserved, whereas most of the remaining helices in domain I were established from CCSs. U · U pairs at the fifth couplet preserve a 5-bp helix 9 as proposed in other models [12, 35, 41]. Helices 27 and 39 form in all the taxa, although the hydrogen bonds are always disrupted in these two helices. Comparative analysis suggested eight couplets of helix 47 in Caelifera, and the initial two couplets are disrupted in most of the Ensifera taxa except Gryllotalpa. The single nucleotide bulges of helices 47 and 367 are conserved, often serving as sequence anchor in sequence alignment. The distal part of helix 511 is conserved among orthopteran; in contrast, the couplets of the proximal part are neither conserved nor covaried. Compared with the E.coli model, the region enclosed by helix 47 has a significant reduction in orthopteran, too variable for sequence alignment and general model construction. Previously, Mfold analysis [42] suggested two helices in this region of Caelifera, numbering helices 48 and 49 in Figure 5(a). However, it is difficult to draw a similar universal structure for the referenced sequences of Ensifera.

Domain II displays five helices. Helix 567 contains three base pairs established by CCSs. Similar to the C. parallelus model, most taxa of Caelifera have a 4-bp helix 577; in comparison, there are two additional couplets at the distal end of Ensifera. Helix 673 in almost all referenced sequences have two couplets and a 6-bp loop; however, the majority of the proximal part is less conserved unless in the same genus. RNAalifold analysis [43] indicated five nucleotide interactions (at position 215 : 219 to 260 : 264 in the 12S rRNA of A. cinerea) for Caelifera. The distal part of helix 769 is the most conserved region in domain II, encompassing the universal primer SR-N-14588. Six other base pairs likely reside at the base of helix 769. Nucleotides undergo covaried substitutions at the first three base pairs of helix 885. As in the C. parallelus model, we propose four couplets for the distal extension, although there are usually noncanonical interactions at the fourth and fifth couplets (350 : 362 and 353 : 359) of helix 885.

The secondary structure of domain III has been demonstrated in many insect taxa [8, 11, 41, 44, 45]. The structure of this domain in this study is based on the C. parallelus model on CRW with min or difference such as the two additional couplets at the end of helix 921 as well as another conserved base pairing at the beginning of helix 944.

Helices 1399 and 1506 at the 3′ end of rrnS molecules are both conserved, and the constructed secondary structures are highly concordant with the C. parallelus model. Previously, the enlarged loop of helix 1399 was shown in Zygaenidae Himantopterus dohertyi and Somabrachys aegrota [12]. The loop of helix 1399 in Caelifera is substantially larger than those of moths (Figure 5(a)), potentially indicative of an autapomorphy of this insect group. The enlarged region after the thirteenth couplet usually starts with a conserved motif “AU” and ends by an adenine. About six couplets and a symmetrical bulge have been proposed to consist of the enlarged region in C. parallelus. However, since our data do not support this hypothesis, studies of additional sequences from Caelifera are needed to clarify this issue.

The rrnL of A. cinerea is 1316 bp in length and divided into six domains (labeled I, II, III, IV, V and VI), each separated by a single stranded region [41]. Domain III is absent in arthropods mitochondrion (Figure 5(b)). The majority of structural and phylogenetic studies had focused on the 3’-half of the rrnL molecule [7, 4648], corresponding to highly conserved domains IV and V (Figure 5(b)). Due to relative high variability and few applicable primer sets [1], domains I, II, and VI are seldom used in secondary structure prediction and molecular phylogenetic studies [41].

Compared to the E.coli model, considerable degeneration in domain I of Orthoptera leads to only five remaining helices. This initial region of the rrnL molecule is highly variable and difficult to align. Consistent with the D. melanogaster model [35], two stems (helices 183 and 235) are hypothesized before helix 461. Comparative sequence analysis has established the second, third, and fourth couplets of helix 235, but convincing evidence for a 2-bp helix 183 in Orthoptera is still missing. Although a few noncanonical interactions U · U are found at the second couplet of helix 461 in Caelifera, it is supported by CCSs in the taxa of Ensifera. Nucleotides surrounding helices 461 and 533 are highly conserved, with helix 563 as the most conserved helix of domain I both in primary sequence and secondary structure.

Domain II is not well conserved; nevertheless, most of the helices are established by compensatory changes including the long-distance pairing helices 579 and 812. Hydrogen bonds of the last two base pairs of helix 671 and the initial two couplets of helix 946 are disrupted in Caelifera, but remain intact in Ensifera. Regions between helices 822 and 946 and helices 946 and 812 are extremely variable, exhibiting distinct shapes in different models [35, 41, 49]. A 4-bp helix 991 is predicted according to CCSs. The distal part of helix 1057 is constant in Orthoptera species. The internal bulge of helix 1087 is unstable in certain Ensifera species. The primary sequence and secondary structure of helix 1196 are extremely variable in Orthoptera except for the initial couplet as confirmed by CCSs.

Domain VI contains 3 helices. The distal part of helix 2646 is extremely conserved. Despite certain noncanonical interactions or mismatches, the 7 base pairs of helix 2646 are validated by CCSs. In most of the taxa, a 5-bp helix 2675 terminated with a variable loop is predicted, whereas the structure of helix 2735 is unclear.

3.4. A+T-Rich Region

The largest noncoding region of insect mtDNA, called the “AT-rich region” due to its high AT content, is considered to be involved in the regulation of mtDNA transcription and replication [1]. It is often unclear whether these “control elements” are homologous between distantly related animal or have arisen from various noncoding sequences independently in separate evolutionary lineages due to the low sequence similarity except among closely related animals [2].

As with other Orthoptera species, the A+T-rich region of A. cinerea is located between rrnS and trnI (Figure 1 and Table 1). It is 784 bp in length and 87.88% A+T content, both within the range of Orthoptera, and apparently contains no repeat region. Among the 21 orthopterans studied here, the length of the A+T-rich region ranges from 70 bp in R. dubia to 1401 bp in O. asiaticus (Table 1). The length differences among closely related taxa are mainly caused by the variation in the size and copy number of repeat units [50].

The Orthoptera sequences studied here belong to four different superfamilies, including 12 Acridoidea, 1 Pyrgomorphoidea, 5 Grylloidea, and 5 Tettigoniidea. The first two groups belong to Caelifera, and the remaining groups belong to Ensifera. The control region between the two Acrida species is highly similar, and the percentage of identity nucleotide is 97.07%. The main difference between the two subspecies of L. migratoria is the copy number of repeat units.

In Orthoptera, large repeat regions have been reported in X93574 Chorthippus parallelus [50] and X15152 Gryllus firmus [51] as well as in the mitochondrial genomes of L. migratoria [24], G. marmoratus [23], O. asiaticus [23], L. m. migratoria, T. emma [30], and G. gratiosa [32]. Most of the tandemly repeated sequences were found at the end next to the rrnS and the first repeat begins with a 12 (in C. parallelus) ~64 (in G. gratiosa) nucleotide extension at the rrnS (Table 4). However, in O. asiaticus, two different repeat units are present on either end of the A+T-rich region. The final repeat at the 3′ end usually has more sequence variations than the others. In addition to strong conservation in the same sequence, the repeat units also show little variation in subfamily Oedipodinae (Table 4). Although the repeat units of G. firmus and T. emma show low sequence identities (Table 4), the shared dyad symmetric sequence 5′-GGGGGCATGCCCCC-3′ may be a conserved motif in this subfamily.

A potential stem-loop structure, potentially involved in replication initiation, is located at the central region near the trnI gene of L. migratoria, and easily distinguished from the repeated sequence [52]. Besides desert locust S. gregaria and the meadow grasshopper C. parallelus [50], a stem-loop structure, 16 bp or longer, also exists in the same position in all of the taxa from Caelifera. Nucleotides of this region are almost identical except for the distal three base pairs as revealed by compensatory substitutions (Figure 6). The flanking regions, including “TATA” on the 5′ end and “G (A)nT” on the 3′ end, are also conserved in Caelifera except O. chinensis and A. sinensis. Other conserved structural elements [13, 50] were also found in the referenced species of Caelifera, except for the long polythymine stretch often interrupted by other nucleotides such as cytosine. Acrida sequences lack the >4 bp T-stretch. Rather, the motif “TATTTwATryAyAAA” adjacent to the tRNAIle is more conserved in the Caelifera taxa (Figure 6).

Previously, it was proposed that a sequence segment in each repeat unit forms a stem loop structure with homologous to those found in Drosophila and S. gregaria/C. parallelus. If the stem-loop structure for replication initiation is included in the repeated sequence, the same structure may also exist in the closely related T. emma mtDNA sequence. However, in T. emma, the proposed stem-loop [50] in each repeat unit contains more mismatches between base pairs. In addition, M. manni, another Gryllidae species, lacks a large tandem repeat in A+T-rich region, suggesting that additional sequences may be involved in replication initiation. Two adjacent nucleotide stretches were found in the sequences of G. firmus, T. emma and M. manni, with a T-stretch interrupted by C located upstream of an A-stretch interrupted by G. firmus. These two stretches may form a 16-bp stem and loop structure similar to that of Caelifera, coincidently located at the corresponding position except for G. firmus (Figure 4). In Gryllotalpa, a similar stem-loop structure was also detected. Furthermore, the structure was well established by CCSs in the Grylloidea superfamily.

In conclusion, the stem-loop predicted in this study is likely to be involved in replication initiation in the taxa of Caelifera and Grylloidea. In contrast with these two taxa, detection of the conserved stem-loop structure in the Tettigoniidae is more difficult. Three available complete genomes in Tettigoniidae (A. simplex [14], D. onos and G. gratiosa) exist a common feature with a long T-stretch (>17 bp) next to rrnS on the H-strand, bounded by a purine at either end.

4. Conclusions

The mitogenome of A. cinerea displays characteristic gene content, genome organization, nucleotide composition, and codon usage found in other Caelifera mitogenomes. Comparison of all available 21 orthopteran mitogenomes provides us more information about the evolution of mitogenomes in this insect group.

Comparison of tRNAs sequences from Orthoptera revealed that the conservation of tRNA genes was not associated with the frequency of codon usage but rather with the coding strand. The tRNAs encoded by the H-strand appear more conserved than those by the L-strand. All tRNAs form the typical clover-leaf structure except trnS (agn). Most of the size variation among tRNAs stemmed from the length variation in the arm and loop of TΨC and the loop of DHU.

The secondary structure models of the rrnS and rrnL from 21 Orthoptera taxa were predicted using the comparative approach. The derived secondary structures closely resemble those from other insects on CRW except a considerably enlarged loop of helix 1399 of rrnS in Caelifera, thus confirming the majority of previously proposed base pair interactions in the rRNA molecules.

In the A+T-rich region of Orthoptera, tandem repeats are not only conserved in individual mitogenome but also show conserved sequence blocks in the same subfamily. Conserved stem-loop structures, potentially involved in replication initiation, were found at the similar position within the A+T-rich region of all Caelifera and Grylloidea mitogenomes. A long T-stretch (>17 bp) with conserved stem-loop structure next to rrnS on the H-strand, bounded by a purine at either end, exists in the three species from Tettigoniidae.

Abbreviations

atp6 and atp8:Genes encoding for ATP synthase subunits 6 and 8
cob:Gene encoding for cytochrome oxidase b
cox1-3:Genes encoding for cytochrome c oxidase subunits I-III
nad1-6 and nad4L:Genes encoding for NADH dehydrogenase subunits 1–6 and 4L
rrnL and rrnS:Genes encoding for the large and small subunits of ribosomal RNA
lrRNA and srRNA:Large and small subunits of ribosomal RNA
trnX:Genes encoding for transfer RNA molecules with the corresponding amino acid denoted by the one-letter code and anticodon indicated in parentheses (nnn) when necessary
tRNA-X:Transfer RNA molecules with the corresponding amino acids denoted by a one-letter code and anticodon indicated in parentheses (NNN) when necessary
PCG:Protein coding gene
CR:Control region
NCR:Noncoding region
bp:Base pair (s)
kb:Kilobases
nt:Nucleotide (s)
aa:Amino acid (s)
mtDNA:Mitochondrial DNA
PCR:Polymerase Chain Reaction.

Acknowledgments

The authors thank Dr. Huimeng Lu and Dr. Jing Hu for primer design and helpful discussion during the project. The study was supported by the National Natural Science Foundation of China (Grant nos. 30670279 and 30970346).