Abstract

The complete mitochondrial genome sequences of the two butterfly species Euploea mulciber (Lepidoptera: Nymphalidae: Danainae) and Libythea celtis (Lepidoptera: Nymphalidae: Libytheinae) were determined in this study, comprising 15,166 bp and 15,164 bp, respectively. The orientation and the gene order of the two mitogenomes are identical to those of most of the other lepidopteran species. All protein-coding genes of Euploea mulciber and Libythea celtis mitogenomes start with a typical ATN codon with the exception of COI gene which uses CGA as its initial codon. All tRNA genes possess the typical cloverleaf secondary structure except for tRNASer (AGN), which has a simple loop with the absence of the DHU stem. There are short microsatellite-like repeat regions, but no conspicuous macrorepeats scattered throughout the A + T-rich regions. Phylogenetic analysis among the available butterfly species suggests that Libythea celtis (Libytheinae) is closely related to Calinaga davidis (Calinaginae), indicating that the subfamily Libytheinae may not represent a basal lineage of the Nymphalidae as previously suggested, and that Euploea mulciber stands at the base of the nymphalid tree as a sister to all other nymphalids.

1. Introduction

The animal mitochondrial genome (mitogenome) is a circular molecule of 15–20 kb in length. It contains 37 conserved genes including 22 transfer RNA genes (tRNAs), 2 ribosomal RNA genes (rRNAs), and 13 protein-coding genes (PCGs) involved in electron transport and oxidative phosphorylation [1, 2]. It also contains an A + T-rich region which is the largest noncoding area involved in the initiation and regulation of replication and transcription [3]. Mitogenome studies are important for comparative and evolutionary genomics, phylogenetics, molecular evolution, and population genetics due to the genome’s unique features, such as maternal inheritance, lack of extensive recombination, and accelerated nucleotide substitution rates [4, 5].

The Lepidoptera (butterflies and moths) is probably the largest insect order, containing over 165,000 described species. The systematics of the lepidopteran higher groups has long been a matter of contention [68]. In China, Chou’s taxonomic system [9] is widely adopted, in which the Chinese butterflies are split into 12 families and 32 subfamilies based on their morphological characteristics. However, another taxonomic system proposed by Wahlberg et al. (2005) [10], which is followed herein, has been commonly accepted by most butterfly researchers. In this system, the butterflies are classified into 2 superfamilies (Hesperioidea and Papilionoidea) and 5 families (Hesperiidae, Papilionidae, Pieridae, Lycaenidae, and Nymphalidae). Thus, the taxonomic levels and the phylogenetic relationships within and among butterfly lineages of some butterfly groups, such as the danainids and libytheines, remain controversial issues among different researchers.

Euploea mulciber is a member of the subfamily Danainae (formerly treated as a family). This medium-sized butterfly species is commonly called “striped blue crow” due to the male’s bright blue shot forewing and the female’s striped hindwing with narrow white streaks; its main host plants, Ficus and Nerium, are poisonous, mimicked by other butterfly species such as Elymnias malelas, Hypolimnas anomala, and Chilasa paradoxa. E. mulciber is distributed in India, Nepal, Bhutan, Burma, Malaysia, and China [9].

Libythea celtis is a species of the subfamily Libytheinae (formerly treated as a family). This species is recorded in Southern Europe, Northern Africa, and Asia, and their larvae generally feed on leaves of Celtis (Celtidaceae). The Libytheinae include only about 10 species in the world, so far as known, 3 of which are in China. They can be easily recognized by their unusually long labial palpus, although there are some other nymphalids that have an equally long or longer labial palpus [8]; the libytheine threadlike labial palpus looks like the petiole of a dead leaf, offering camouflage when the butterfly is taking a rest [11].

Currently, data of lepidopteran mitochondrial genomes are quite limited [1219]. Among these, 20 complete and 2 nearly complete mitochondrial sequences of butterflies have been determined. In this study, the complete mitogenomes of the two representative nymphalid species mentioned previously were determined and compared with other butterflies available, in order to provide more information for the phylogenetic studies of lepidopterans and further clarify their taxonomic status within the family Nymphalidae at the mitogenomic level.

2. Materials and Methods

2.1. Specimen Collection

Adult individuals of Euploea mulciber and Libythea celtis were collected in Dali, Yunnan province, and Huangshan, Anhui province, China, respectively. The fresh material was preserved in 100% ethanol and stored at −20°C before the DNA extraction.

2.2. DNA Extraction, PCR Amplification, and Sequencing

The whole genomic DNA was extracted from thoracic muscle tissue with the DNeasy Blood & Tissue kit (Qiagen) after Hao et al. (2005) [20]. Some universal PCR primers for short fragment amplifications of srRNA, COI, CytB, ND1, and COII genes were synthesized after Simon et al. (1994) [21], Caterino and Sperling (1999) [22], and Simmons and Weller (2001) [23]. The remaining primers for the two species were designed by the multiple sequence alignments of the complete mitogenomes of all the lepidopterans available, using ClustalX1.8 [24] and Primer Premier 5.0 softwares [25]. The entire mitogenomes of Euploea mulciber and Libythea celtis were amplified in five fragments (COI-COIII, COIII-ND4, ND4-CytB, and CytB-lrRNA, lrRNA-COI) and ten fragments (COI-COII, COII-COIII, COIII-ND5, ND5-ND4, ND4-CytB, CytB-ND1, ND1-lrRNA, lrRNA-srRNA, srRNA-ND2, and ND2-COI) using long PCR technique, which was performed using TaKaRa LA Taq polymerase with the following cycling parameters: 95°C for 5 minutes, 30 cycles of 95°C for 50 seconds, 47–51°C for 50 seconds, 68°C for 2 minutes and 30 seconds, and a final extension step of 68°C for 10 minutes. The PCR products were detected via electrophoresis in 1.2% agarose gel, purified using the 3S Spin PCR Product Purification Kit and sequenced directly with ABI-377 automatic DNA sequencer. All the amplified products were sequenced directly except the ND2-COI of Libythea celtis, which was sequenced after cloning. For each long PCR product, the full double-stranded sequence was determined by primer walking.

2.3. Sequence Analysis and Annotation

Raw sequence files were proofread and assembled with BioEdit version 7.0 [26]. PCGs and rRNAs were identified by sequence comparison using ClustalX1.8 software and the NCBI Internet BLAST search function [27]. tRNA gene analysis was conducted using tRNAscan-SE software v.1.21 [28], and the putative tRNAs, which were not found by tRNAscan-SE, were confirmed by sequence comparison of Euploea mulciber and Libythea celtis with other lepidopterans. Nucleotide composition and codon usage were calculated in MEGA 4.0 software [29]. The Euploea mulciber and Libythea celtis mitogenome sequence data have been deposited into the GenBank database under the accession numbers HQ378507 and HQ378508, respectively.

2.4. Phylogenetic Analysis

The multiple alignments of the concatenated 13 PCG nucleotide sequences of the 24 butterfly mitogenomes, including two newly determined and 22 available from GenBank (Table 1), were done by the ClustalX1.8 software and then manually proofread. The genetic distances were calculated based on selected sequence evolutionary models; meanwhile, the neighbor joining (NJ) [30] and Bayesian inference (BI) [31] phylogenetic trees were reconstructed based on the PCG sequences using one moth species, Lymantria dispar (GenBank accession no. FJ617240) as the outgroup. In the NJ analysis, the Kimura 2 parameter (K2P) evolutionary model [32] was selected, and the bootstrap values with internal branch tests were obtained by 1,000 replicates to estimate the support levels for the nodes in the resultant topologies using the MEGA 4.0 software. The Bayesian analyses were performed based on the PCG nucleotide sequences using MrBayes 3.1.2 [33, 34] under the model GTR+I+G chosen by the modeltest 3.06 software [35], and the Markov’s chains were run simultaneously for 1,000,000 generations sampled every 100 generations; the first 2,500 trees were discarded as burn-in samples, and the remaining trees were used to generate a majority rule consensus tree, in which the percentage of samples recovering a particular clade represents its support measured as posterior probabilities.

3. Results and Discussion

3.1. Genome Organization and Structure

The mitogenomes of Euploea mulciber and Libythea celtis both consist of 13 PCGs, two ribosomal RNA genes for the small and large subunits (srRNA and lrRNA), and 22 tRNAs, and an A + T-rich region (see Tables 1 and 2 and Figure 1). Like many other insect mitogenomes, its major strand code for more genes (9PCGs and 14 tRNAs), whereas less genes were coded in the minor strand (4 PCGs, 8 tRNAs, and 2 rRNA genes). The mitogenome size of 15,166 bp and 15,164 bp determined here is close to most lepidopteran mitogenomes although slightly shorter than those of most completely sequenced lepidopteran insects (Table 1). The Euploea mulciber mitochondrial genes are interleaved with a total of 117 bp intergenic spacer sequences, which are spread over 13 regions, ranging from 1 to 46 bp, and only three spacers span longer than 10 bp, located between and ND2 (46 bp), between ND5 and (15 bp), and between (UCN) and ND1 (20 bp). Additionally, a total of 32 bp overlapped nucleotides are spread over 12 regions, and every region is shorter than 10 bp in length. The Libythea celtis mitochondrial genes are interleaved with a total of 97 bp intergenic spacer sequences, which are also spread over 13 regions, ranging from 1 to 52 bp, among which only two spacers span longer than 10 bp, located between and ND2 (52 bp) and between (UCN) and ND1 (17 bp). Besides, a total of 27 bp overlapped nucleotides are spread over 9 regions, and every region is shorter than 10 bp in size. The nucleotide compositions of A + T in Euploea mulciber and Libythea celtis mitogenomes are 81.5% and 81.2%, respectively, showing strong A + T bias. Correspondingly, the AT contents of the two butterfly species are up to 80.2% and 80.0% for the PCGs, 84.6% and 84.7% for the lrRNA gene, 85.3% and 85.4% for the srRNA gene, and 93.5% and 96.3% for the control region. This case is similar to those observed in other rhopaloceran mitogenomes, ranging from 79.7% in Pieris rapae and Acraea issoria to 82.7% in Coreana raphaelis (Tables 1 and 2).

3.2. Protein-Coding Genes

The total codon numbers of the 13 PCGs for Euploea mulciber and Libythea celtis are 3722 and 3705, respectively (Table 1). The nucleotide composition for each codon position and the codon usage of PCGs are shown in Tables 3 and 4. For codon usage, some codons have much more numbers, such as the TTA(L), ATT(I), TTT(F), and ATA(M), while some codons, such as CGT(L), AGG(S), and GGC(G) are not used. In addition, the synonymous codon bias, namely, the relative synonymous codon usage (RSCU) are detected, and their bias patterns roughly correspond to those of codon usage, for example, the four most frequently used codons (TTA, ATT, TTT, and ATA) account for 10.14% of all the codons. The AT and GC nucleotide skewness values for the PCGs as well as the whole genomes are listed in Table 5, and the results showed that they are slightly negative and positive for the whole PCGs and their minor strand of the two nymphalid species, respectively, indicating that more T and G are used than A and C; however, both values are slightly negative for the major strands, indicating that more T and C are used in these coding regions. In general, these codon constitutional and biased patterns are significantly congruent with those of other lepidopteran species determined to date [13, 18, 19, 3638].

Except for COI, 12 out of 13 PCGs in Euploea mulciber and Libythea celtis use standard ATN start codons: ATA for ND6; ATT for ND2, ATP8, ND3, and ND5; ATG for COII, ATP6, COIII, ND4, ND4L, CytB, and ND1 in Euploea mulciber; ATC for ND2; ATT for ATP8, ND3, ND5, and ND6; and ATG for COII, ATP6, COIII, ND4, ND4L, CytB, and ND1 in Libythea celtis.

In Figure 2, we show the alignment of the initiation site for the COI genes of 24 completely sequenced rhopaloceran mitogenomes. All the mitogenomes harbor a few canonical ATN initiators detected in the start region of the COI gene or in the immediately neighboring sequence of the precedent . However, in eleven of them (Artogeia melete, Coreana raphaelis, Argyreus hyperbius, Teinopalpus aureus, Papilio maraho, Luehdorfia chinensis, Parnassius bremeri, Protantigius superans, Troides aeacus, Libythea celtis, and Euploea mulciber), a TAG (stop codon) is present at the beginning region of the COI gene, while in the other five mitogenomes (Papilio xuthus, Sericinus montela, Ctenoptilum vasava, and two Sasakia subspecies), the proposed ATN initiators are present within the region coding for . Thus, the ATN initiators should not be considered as the start codon for the COI gene. According to these criteria, the first nonoverlapping codon in the COI gene is the CGA designating arginine existing in a highly conserved region across rhopaloceran insects [13].

Nine of the PCGs in Euploea mulciber and Libythea celtis harbor the complete termination codon TAA or TAG, that is, eight protein-coding sequences stop with TAA, one with TAG. The other four are with the incomplete termination codon T. This phenomenon of partial termination codons (i.e., T or TA) is observed in all sequenced lepidopteran insects and has been interpreted in terms of posttranscriptional polyadenylation, by which “A” residue(s) are added to create TAA terminator [13].

3.3. Ribosomal RNAs and Transfer RNAs

The lrRNA genes of the Euploea mulciber and Libythea celtis mitogenomes are 1314 and 1335 bp in length, respectively. As has been observed in other insects, this gene is located between (CUN) and genes [12, 13, 15, 16, 19, 37, 3945]. In the same way, the srRNA genes of the Euploea mulciber and Libythea celtis mitogenomes are 776 and 775 bp in length, respectively, and located between and A + T-rich region. Both of the two rRNA genes are encoded by the minor strand, and the lengths of them are well within the size range of other insects determined. The AT contents of lrRNA and srRNA genes are 84.6% and 85.3% in Euploea mulciber and 84.7% and 85.4% in Libythea celtis, respectively, and these values are also well within the range of the sequenced insects.

Like other insects, both Euploea mulciber and Libythea celtis harbor 22 tRNAs, among which both Leucine and Serine have two tRNAs, and the left tRNAs for each of 18 amino acid residues are all one in number (Figure 1). In this study, all the tRNAs of Euploea mulciber and Libythea celtis evidence the typical cloverleaf structures with the exception of (AGN), whose dihydrouridine (DHU) arm forms a simple loop (maps of secondary structure are not shown here). The case of (AGN) with the simple loop in the DHU arm has also been detected in most of insect species including the lepidopterans determined till now [1219, 36, 37, 39, 42, 4548].

3.4. A + T-Rich Region

The A + T-rich region is known for the initiation of replication in vertebrates and invertebrates and thus also called as the control region. The Euploea mulciber and Libythea celtis A + T-rich regions located between srRNA and were 399 and 328 bp in length, respectively (see Figure 1 and Table 2). Though the A + T-rich region of Libythea celtis is the shortest, its A + T content is the highest (96.3%) in all of the completely sequenced rhopaloceran mitogenomes known to date. In general, this region harbors the highest A + T content in the whole regions of the insect mitogenomes, and this feature is well presented in those of the two nymphalid species in this study. However, the regions of these two species harbor no large sequence repeats detected frequently in other insect groups, including some of the lepidopterans. Nonetheless, there are several short sequence repeats scattered throughout the whole region, for example, a poly T closed to the srRNA, a TA repeat in the middle of the A + T-rich region and a poly A closed to .

Alignment of the initiation sites for the A + T-rich regions of 22 completely sequenced rhopaloceran mitogenomes is shown in Figure 3, and the results showed that all the A + T-rich regions harbor a poly-T stretch preceded by the ATAGA motif. The Euploea mulciber and Libythea celtis harbored a 21 bp and 19 bp long poly-T stretch, respectively. To the best of our knowledge, the poly-T stretch is quite conserved in all sequenced rhopaloceran insects, ranging from 12 bp in Agehana maraho [49] to 21 bp in Euploea mulciber of this study and Apatura ilia [50] (Figure 3), and it has been suggested as a possible recognition site for the replication initiation of the mtDNA minor strand owing to the facts that the insect mitochondrial A + T-rich region has been revealed to harbor the replication originating sites for both strands in Drosophila [40, 41], and its neighboring region has been identified as the position of the minor-strand replication origin in Bombyx mori [51].

A microsatellite-like elements (AT)3(TA)6 and (TA)12 preceded by the ATTTA motif are presented in the middle of the Euploea mulciber and Libythea celtis A + T-rich region, respectively. Similar cases are commonly found in all other sequenced lepidopteran species [1215, 17, 19, 3639, 4244, 4648]. Additionally, a 12 bp poly A interrupted by T and a 7 bp poly A interrupted by G presented immediately the upstream of the . This poly A element at the end of the A + T-rich region and the short repeating sequences are probably universal features in lepidopterans, and their unknown functions await further in-depth studies.

3.5. Phylogenetic Analysis

Thirteen concatenated PCGs nucleotide sequences of complete rhopaloceran mitogenomes were used to analyze the phylogenetic relationships among the butterfly groups in this study. The resultant 14-taxon (including outgroup) data set is 11,459 characters after removal of gaps, among which 5402 are variable and 3759 are parsimony-informative. The results of neighbor-joining (NJ) and Bayesian inference (BI) analyses show that the topologies of the NJ and BI trees shown in Figure 4 are identical only with minor differences in their node support values. The trees indicates the following relationship: (Papilionidae + (Pieridae + (Nymphalidae + Lycaenidae))), which is congruent with that obtained by Walhberg et al. (2005): (Hesperiidae + (Papilionidae + (Pieridae + (Nymphalidae + (Lycaenidae + Riodinidae))))) [10], though the data from the Hesperiidae and Riodinidae is not available here. Additionally, within the Nymphalidae, Euploea mulciber stands at the base of the nymphalid tree, as a sister to all the remaining groups of the family, and Libythea celtis (Libytheinae) is the nearest sister tothe Calinaga davidis (Calinaginae).

The danaids, formerly classified as familial taxa under the family Nymphalidae, have now been relegated to the subfamily Danainae. They possess unique features, such as the brush-like modified forelegs, poisonousness to their potential predators, which are markedly distinct with other nymphalid butterflies, and have been proposed as one of the basal groups in the family Nymphalidae by Walhberg et al. (2003) based on molecular evidences [52]. Morphologically, Freitas and Brown (2004) also revealed that the danaid clade (including Danaini, Tellervini, and Ithomiini) appeared as the basal group of the Nymphalidae excluding Libytheinae based on the immature features [53]. Hao et al. (2007) reconstructed the mitochondrial lrRNA secondary structures of the main butterfly lineages, and the results showed that the morphological characteristics of the danaid species Euploea core and Parantica aglea were markedly different from other nymphalid groups [54]. In this study, both NJ and BI trees indicate that Euploea mulciber is sister to all other nymphalid species, standing at the base of the whole nymphalid tree. Therefore, we conclude that the danaid butterflies are the most primitive group of Nymphalidae.

The Libytheinae is one of the most morphologically unusual groups of butterflies, and their relationship with other butterfly groups remains controversial. Traditionally, they are considered as the basal group of Nymphalidae [16, 5557]. Some scholars even considered them as a separate family owing to their unique morphological characters such as the markedly prolonged snout and well-developed foreleg in females [58, 59]. In addition, some phylogenetic analyses based on morphological, molecular, or combined data also proposed that Libytheinae is the basal lineage of the Nymphalidae [10, 6062]. However, this study reveals that Libythea celtis is closely related to Calinaga davidis (Calinaginae) with a weak bootstrap support value (65%) in NJ tree, whereas, stands as a sister to the species grouping of Limenitidinae, Heliconiinae, Nymphalinae, and Apaturinae with a high posterior probability value (0.97) in BI tree (Figure 4). Thus, its phylogenetic position within Nymphalidae awaits further clarification in the future, on the grounds of more taxa samplings as well as more realistic analysis methods. In conclusion, it is obvious that the Danainae is the basal branch of the Nymphalidae, rather than the Libytheinae.

Acknowledgments

This work was supported by the National Science Foundation of China (Grant no. 41172004), the CAS/SAFEA International Partnership Program for Creative Research Teams, Chinese Academy of Sciences (Grant no. KZCX22YW2JC104), the Provincial Key Projects of Natural Science Foundation, Colleges of Anhui Province (Grant no. KJ2010A142), and the Opening Funds from the State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences (Grant no. 104143).