Abstract

Acer truncatum, which is a new woody oil tree species, is an important ornamental and medicinal plant in China. To assess the genetic diversity and relationships of A. truncatum, we analyzed its complete chloroplast (cp) genome sequence. The A. truncatum cp genome comprises 156,492 bp, with the large single-copy, small single-copy, and inverted repeat (IR) regions consisting of 86,010, 18,050, and 26,216 bp, respectively. The A. truncatum cp genome contains 112 unique functional genes (i.e., 4 rRNA, 30 tRNA, and 78 protein-coding genes) as well as 78 simple sequence repeats, 9 forward repeats, 1 reverse repeat, 5 palindromic repeats, and 7 tandem repeats. We analyzed the expansion/contraction of the IR regions in the cp genomes of six Acer species. A comparison of these cp genomes indicated the noncoding regions were more diverse than the coding regions. A phylogenetic analysis revealed that A. truncatum is closely related to A. miaotaiense. Moreover, a novel ycf4-cemA indel marker was developed for distinguishing several Acer species (i.e., A. buergerianum, A. truncatum, A. henryi, A. negundo, A. ginnala, and A. tonkinense). The results of the current study provide valuable information for future evolutionary studies and the molecular barcoding of Acer species.

1. Introduction

Acer truncatum Bunge, which is a member of the order Sapindales and the family Aceraceae, is a new versatile oil-producing woody tree that is widely distributed in northern China, Korea, and Japan, where it is a native species, but it has also been detected in Europe and North America [1]. This tree species represents a potential source of medicinal compounds. Many highly bioactive compounds have been extracted from Acer species, such as flavonoids, tannins, alkaloids, and terpenoids [2]. Acer truncatum seeds are processed to extract the seed oil, which was listed as a new food resource by the Ministry of Health of the People’s Republic of China in 2011. Approximately 5–6% of the A. truncatum seed oil is nervonic acid (C24 : 1) [3]. Nervonic acid, which is a key component of brain nerve cells and tissues, promotes the repair and regeneration of nerve cells and damaged tissues, and has been detected in the seed oil of a number of plants. Thus, A. truncatum seed oil represents a novel plant resource with potential applications for treating human cerebral and neurological problems [4].

Chloroplasts (cps) have important functions related to some essential metabolic pathways, including photosynthesis and glycometabolism [5, 6]. In plants, the DNA-replication mechanism associated with the cp genome is independent of the nuclear DNA-replication mechanism. Moreover, the cp genome is more highly conserved than the nuclear genome. In 1986, the liverwort (Marchantia polymorpha) cp genome became the first such genome to be described [7]. The subsequent emergence of rapid and cost-effective genome-sequencing technologies has led to more cp genomes being sequenced, with the resulting data deposited in the GenBank database. These sequences indicate that the angiosperm cp genomes typically form a circular DNA molecule comprising 120–170 kb that encode 120–130 genes [8]. The circular cp genome structure consists of the following four segments: two inverted repeat (IR) regions separated by large single-copy (LSC) and small single-copy (SSC) regions [9, 10]. However, genome size variations [11], rearrangement events [1214], and gene losses [15] have been detected in some plant species. There is also considerable diversity in the IR size, possibly because the expansion and contraction of the IR regions have been very common events during the evolution of plant species, including those belonging to Fabaceae [16] and Poaceae [17]. The complete cp genome has been used in investigations of phylogenetic relationships, molecular markers, and evolution [18].

Insertions/deletions (indels) and single-nucleotide polymorphisms (SNPs) within the cp genome have been used to rapidly distinguish species [1921]. Additionally, cp markers have been developed to identify closely related species, including buckwheat and the species of Solanum, Angelica, and other genera [2022]. For example, Park et al. [23] used two indel markers (trnK-trnQ and ycf1-ndhF) to differentiate three Aconitum species. Additionally, indels in the trnL-F, trnG-trnS, and trnL introns have been used to analyze the molecular evolution of the Silene species cp genome [23]. Thus, indel and SNP cp markers are important for identifying species and investigating molecular evolution.

Several Aceraceae species have recently had their cp genome sequences published, including Acer morrisonense [24], Dipteronia sinensis and Dipteronia dyeriana [8], and Acer griseum [25]. Chen et al. [26] were the first to report the complete A. truncatum cp genome; however, they only focused on the genome composition and phylogenetic relationships. Thus, the A. truncatum cp genome was not comprehensively characterized. Compared with the result of Chen et al. [26], in our study, we also found A. truncatum is closely related to A. miaotaiense. Moreover, we also analyzed the repetition, contraction, and expansion of the IR regions as well as the synonymous and nonsynonymous substitution rates. Highly divergent regions and potential indels were detected via a comparative analysis of six available cp genome sequences. Additionally, on the basis of the results of our comparative analysis of cp genomes, we developed the ycf4-cemA indel marker to distinguish six Acer species (i.e., A. buergerianum, A. truncatum, A. henryi, A. negundo, A. ginnala, and A. tonkinense). The data presented herein will enrich the genetic information available for the genus Acer, provide novel insights into A. truncatum evolution, and form an important theoretical basis for increasing the A. truncatum seed yield.

2. Materials and Methods

2.1. DNA Sequencing and Chloroplast Genome Assembly

We collected fresh leaves from A. truncatum plants, which were obtained from the Acer germplasm collection of the Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu, China. The leaves were frozen in liquid nitrogen and stored at −80°C. Total DNA was extracted from the frozen leaves with the DNA Isolation Kit (Aidlab, China). We prepared 350-bp shotgun libraries, which were sequenced according to the double-terminal sequencing method of the Illumina HiSeq X™ Ten platform.

A total of 16.30 GB high-quality clean data (Q30 > 95.23%) were used for assembling the sequence as described by Ferrarini et al. [27]. The cp DNA reads were extracted with SMALT, using the A. buergerianum (GenBank accession NC_034744), A. miaotaiense (GenBank accession NC_030343), and A. morrisonense (GenBank accession KT970611) cp genomes as queries. The reads with 90% similarity were considered to be derived from the cp genome. The data were trimmed with Sickle (https://github.com/najoshi/sickle) (using as the threshold for trimming and as the threshold for keeping a read based on length) and assembled with the default parameters of AbySS [28]. Redundant contigs were removed with the CD-Hit program [29] (threshold of 100%) and the unique contigs were merged with the default parameters of Minimus2. The boundary regions of LSC/IRB, IRB/SSC, SSC/IRA, and IRA/LSC of the completed cp genomes were validated with PCR-based sequencing. Details regarding the primers are provided in Supplementary Table S1.

2.2. Annotation and Comparative Analysis

The A. truncatum cp genome was annotated with DOGMA (http://dogma.ccbb.utexas.edu/). The start and stop codons were coupled manually. All tRNA genes were identified with the default settings of tRNAscan-SE 1.21 [30]. The OGDRAW program was used to visualize the circular A. truncatum cp genome map [31]. Codon usage was analyzed with MEGA 6.0 [32]. The cp genomes of six Acer species (A. truncatum, A. buergerianum, A. davidii, A. griseum, A. miaotaiense, and A. morrisonense) were compared with mVISTA [33, 34], with the annotated A. morrisonense sequence used as the reference.

2.3. Analysis of Repeat Structures and Simple Sequence Repeats

Four types of repeat structures (i.e., forward repeat, palindromic repeat, reverse repeat, and complementary repeat) were identified with REPuter [35]. Additionally, tandem repeats were detected with the default settings of the Tandem Repeats Finder program (version 4.07b) [36]. The simple sequence repeats (SSRs) were analyzed with the MISA program. The motif size for mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSRs was set as 10, 5, 4, 3, 3, and 3, respectively [37].

2.4. Analysis of Synonymous and Nonsynonymous Substitution Rates

The A. truncatum, Citrus platymamma, Dimocarpus longan, and Spondias mombin cp genome sequences were compared to determine the synonymous () and nonsynonymous () substitution rates. The protein-coding exons were separately aligned with MEGA 6.0. The and substitution rates were estimated with DnaSP [38].

2.5. Phylogenomic Analyses

A total of 22 whole cp genome sequences of Sapindales species (Supplementary Table S5) were used for elucidating the evolutionary status of A. truncatum, with Euonymus hamiltonianus (order Celastrales) serving as the outgroup. The 64 single-copy orthologous genes common among the 23 analyzed genomes were aligned with the default parameters of ClustalW 2.0 [39]. The maximum likelihood (ML) analyses of phylogenetic relationships were completed with RAxML using the GTRGAMMA model [40].

2.6. Estimation of the Divergence Time

For the divergence time, we first removed ambiguously aligned sites in the 23 whole genomes data set using GBLOCKS v.0.91b [41] with the flowing parameters: minimum sequences per conserved position, 15; minimum sequences per flank position, 20; maximum number of contiguous nonconserved positions, 8; minimum block length, 10; allowed gap positions, none. Then, the divergence time was estimated with the MCMCTree program of PAML (version 4.9a) [42], with the following parameters: burnin 100000, sampfreq 200, and nsample 10000. Moreover, E. hamiltonianus was constrained to be the outgroup, and the root age was constrained by the divergence time of E. hamiltonianus from A. truncatum (98–117 million years ago) (http://www.timetree.org/).

2.7. Development and Validation of the ycf4-cemA Indel Marker

The indel regions were selected based on the results of a similarity search with mVISTA. Additionally, primers were designed with Primer 5. The PCR amplification was performed as described by Ma et al. [43]. To confirm the accuracy of the PCR product sizes, three samples per species were sequenced by the General Biology Company (Nanjing, Jiangsu, China).

3. Results and Discussion

3.1. Features of the A. truncatum Chloroplast Genome

The A. truncatum genome sequence was submitted to the GenBank database (accession number MH638284). Chen et al. [26] was the first to describe the A. truncatum cp genomic features. Specifically, they reported that the A. truncatum cp genome comprises 156, 262 bp, with an overall GC content of 37.9%. In the current study, we revealed similar structural features, with the A. truncatum cp genome consisting of 156, 492 bp and forming a typical quadripartite structure (Figure 1 and Table 1). The LSC region (86, 010 bp) and SSC region (18, 050 bp) were separated by a pair of inverted repeats (IRA and IRB; 26, 216 bp each). The GC content may be an important factor for assessing species similarity. The GC content of the complete A. truncatum cp genome was 37.90%, which was the same as the result of Chen et al. [26] and that of the LSC, SSC, and IR regions was 36.10%, 32.10%, and 42.80%, respectively, which is similar to the GC contents reported for other Acer species (Table 1) [24, 25]. The rRNA and tRNA genes had the highest GC contents in the IR regions across the complete cp genome, which is a phenomenon that is very common among plant species [44, 45].

We detected 134 genes in the A. truncatum cp genome, including 20 duplicated genes in the IR regions, 112 unique functional genes, and 2 pseudogenes. The 112 functional genes comprised 4 rRNA genes, 30 tRNA genes, and 78 protein-coding genes (Table 2). Among the 134 genes in the cp genome, 17 genes contained introns, of which three genes (ycf3, clpP, and rps12) contained two introns and the remaining genes contained one intron (i.e., eight protein-coding genes and six tRNA genes) (Table 2). The rps12 gene was trans-spliced, with its 3′ exon duplicated in the IRs and its 5′ exon located in the LSC region. Interestingly, trnK-UUU had the largest intron (2,487 bp) because of the presence of the matK gene. The infA and ycf1 genes were designated as pseudogenes. The infA gene contained several internal stop codons and the ycf1 gene was located at the boundary region of IR and SSC (Figure 1).

In this study, we assessed the relative synonymous codon usage (RSCU), which represents the nonuniform synonymous codon usage in coding sequences. Generally, RSCU values >1.00 and <1.00 indicate the codon is used more and less frequently than expected, respectively [46]. The codon usage frequency in the A. truncatum cp genome was estimated based on the protein-coding gene sequences (Table 3). The protein-coding genes comprised 77,796 bp encoding 25,932 codons. Leucine and cysteine were the most and least prevalent amino acids encoded by the codons, accounting for 10.82% and 1.17% of the codons, respectively. With the exception of the methionine and tryptophan codons, most of the amino acid codons had sequence biases [e.g., UUA (RSCU = 1.80) for leucine, UCU (RSCU = 1.56) for serine, and UAU (RSCU = 1.60) for tyrosine] (Table 3). Codon usage was generally biased toward A or T (U) with high RSCU values, which is a phenomenon that is very common among the cp genomes of land plant species [47, 48].

3.2. Analysis of the Repeats in the A. truncatum Chloroplast Genome

An analysis of the repeats in the A. truncatum cp genome revealed 22 long repeats (i.e., one reverse, nine forward, five palindromic, and seven tandem repeats). The only reverse repeat was 35 bp long. The forward and palindromic repeats were mainly longer than 30 bp (Supplementary Table S2 and Figure 2), whereas the tandem repeats were mainly 13–28 bp long (Supplementary Table S3). Most repeats were located in the intergenic spacers, with the rest located in protein-coding regions and introns. Short dispersed repeats are important for promoting cp genome rearrangements [49].

Simple sequence repeats are useful molecular markers for studying genetic diversity and identifying species [43]. In the current study, we detected 78 perfect microsatellites in the A. truncatum cp genome, including 67, 6, 1, and 4 mono-, di-, tri-, and tetranucleotide repeats, respectively; no hexanucleotide repeats were identified (Figure 3(a) and Supplementary Table S4). Most of these repeats were located in noncoding regions. Additionally, A or T accounted for 94.03% of the mononucleotide repeats, whereas all of the dinucleotide repeats were AT. An examination of the distribution of the SSRs in the A. truncatum cp genome indicated that 73.08%, 21.79%, 3.85%, and 1.28% of the SSRs were in the intergenic spacer, protein-coding, intron, and tRNA regions, respectively (Figure 3(b)). Moreover, our data suggest that the A. truncatum cp genome contains fewer SSRs than the A. miaotaiense cp genome [24]. However, in both of these Acer species, the SSRs generally comprise A or T, which contributes to the A/T richness of their cp genomes. These results represent useful information regarding the cp SSR markers that can be applied to investigate the genetic diversity of A. truncatum as well as the relationships among species. These markers may also be relevant for selecting germplasms with high nervonic acid contents.

3.3. Contraction and Expansion of the IR Regions

The number and order of genes were highly conserved among the cp genomes of six Acer species. However, structural changes were detected in the IR boundaries (Figure 4). These changes represent a common evolutionary event and a major factor influencing the size differences among the cp genomes, implying they have an important evolutionary role in plants [50, 51]. We also compared the boundary regions of IR/LSC and IR/SSC in the cp genomes of A. buergerianum, A. davidii, A. griseum, A. miaotaiense, A. morrisonense, and A. truncatum. In the A. buergerianum, A. miaotaiense, and A. truncatum cp genomes, the rps19, ycf1, and rpl2 genes were detected at the junctions of the LSC/IRb, SSC/IR, and LSC/IRa boundary regions, respectively (Figure 4). However, the rps19 gene was located entirely in the LSC region in the A. miaotaiense cp genome, but not in the other cp genomes. Additionally, in the A. buergerianum and A. truncatum cp genomes, the ycf1 gene was located in the SSC/IRa border regions, which resulted in a pseudogene in the IRb region. The cp genomes of the other three species (i.e., A. davidii, A. griseum, and A. morrisonense) exhibited a similar trend regarding the IR contraction and expansion. The rpl22 and ndhF genes were located in the LSC/IRb and SSC/IRb regions, respectively. The rpl22 gene extended 376 bp into the IRb region. In all cp genomes, the trnH gene was located in the LSC region. Overall, we detected the contraction and expansion of the IR regions in all six analyzed Acer cp genomes.

3.4. Comparative Analysis of Six Acer Chloroplast Genomes

A comparative analysis of cp genomes is important for elucidating phylogenetic relationships and identifying species [52, 53]. With the annotated A. morrisonense cp genome as the reference, the overall sequence identities among the six analyzed Acer cp genomes were determined and visualized with mVISTA (Figure 5). The comparative cp genome analysis proved that the noncoding regions were more diverse than the coding regions, which is consistent with the findings in other plant species [54]. The IR regions were more conserved than the LSC and SSC regions, and four rRNA genes were essentially identical in the six Acer species. The intergenic spacers were relatively diverse (e.g., trnH-psbA, matK-rps16, petN-psbM, petA-psbJ, and ycf4-cemA). The most diverse coding regions were the matK, rps2, rpoC2, rpoB, rps19, and ycf1 sequences. Similar results were observed in previous studies [55, 56]. The highly diverse regions identified in the current study may be relevant for developing markers or genetic barcodes useful for exploring the genetic differentiation among Aceraceae species.

3.5. Analysis of Synonymous and Nonsynonymous Substitution Rates

In a previous study, the nonsynonymous and synonymous substitution ratio () was used to evaluate the evolutionary forces on some genes [49]. In this study, the ratio was determined for 78 protein-coding genes following the comparison of the A. truncatum cp genome with the cp genomes of C. platymamma, D. longan, and S. mombin (Figure 6). Nearly all of the ratios were less than 1.0, implying most of the protein-coding genes were under purifying selection during evolution. However, the ratio of seven genes (atpF, matK, psbD, rps16, rps18, rpl36, ndhB, and ycf1) was between 0.5 and 1.0. Moreover, the ratio was greater than 1 for psaIclpP, rps4, rpl22 and ycf2, which indicated these genes were under positive selection during evolution. High ratios have been reported for some genes, including ndhC, rps16, and ycf2 [49]. These results clearly indicate that cp genes in different plant species may be subjected to diverse selection pressures.

3.6. Phylogenetic Analysis

Chloroplast genome sequences are valuable genomic resources for elucidating evolutionary history and have been widely applied in phylogenetic studies [5559]. In the current study, to determine the phylogenetic position of A. truncatum, 22 complete cp genome sequences of Sapindales species were obtained from the GenBank database (Supplementary Table S5). A set of 64 single-copy orthologous genes present in the 23 analyzed cp genomes was used to construct phylogenetic trees, with E. hamiltonianus serving as the outgroup. All Aceraceae species, including Acer and Dipteronia species, were grouped in one clade, which was consistent with the results of earlier investigations [25, 60, 61]. In a previous study, Chen et al. [25] proved that A. truncatum and A. miaotaiense are closely related. In our study, we obtain similar phylogenetic topologies, the ML trees also strongly supported the close phylogenetic relationship between A. truncatum and A. miaotaiense among the Aceraceae species, with 100% bootstrap support (Figure 7). Overall, the result of our analysis of cp genomes provides a valuable foundation for future analyses of the phylogenetic affinities of Acer species.

3.7. Divergence Estimates

Divergence time estimates were based on a single calibartion point at the root node (107.2 Mya), which is the divergence time of E. hamiltonianus from A. truncatum (98–117 million years ago) (http://www.timetree.org/). Results of divergence dates for some of the observed clades as well as the upper and lower bounds of the 95% highest posterior density intervals are shown on Figure 8. According to the MCMCTREE time estimates, the estimated divergence date for Burseraceae and Anacardiaceae, Meliaceae, and Simaroubaceae were 75.9 (52.9–95.8) Mya, and 73.2 (53.9–91.9) Mya, respectively. These results are in agreement with recent study [62]. Additionally, the Spaindaceae and Aceraceae began to split at 64.4 (42.6–87.4) Mya. The divergence time of Acer from Dipteronia is 14.7 (9.0–24.6) Mya within Aceraceae species. Divergence of A. buergerianum from a common ancestor with the five other Aceraceae species was estimated at 13.7 (8.3–23.2) Mya. Moreover, a recent divergence event between A. truncatum and A. miaotaiense around 1.6 (0.7–3.6) Mya. These results of our study will provide insights into the evolutionary of Aceraceae species.

3.8. Development of the ycf4-cemA Indel Marker

Because indel regions are relatively easy to detect, they are often used to develop markers for identifying species [63]. In the current study, the sequence variability of the large indel regions, which was revealed by sequence alignments with mVISTA, was used to develop markers. A comparison with the A. truncatum cp genome sequence detected a 91-bp deletion in the ycf4-cemA region of the A. buergerianum cp genome. The following six Acer species were selected to characterize the ycf4-cemA sequence: A. tonkinense, A. ginnala, A. negundo, A. henryi, A. truncatum, and A. buergerianum. To develop indel markers, sequence-specific primers were designed to anneal to the conserved regions flanking ycf4 and cemA (Table 4). The predicted products were successfully amplified with the ycf4-cemA-F/R primers for all 24 tested samples (Figure 9(a)). The length of the amplified ycf4-cemA sequence was similar for A. tonkinense, A. ginnala, A. negundo, A. henryi, A. truncatum, and A. buergerianum. In contrast, the corresponding sequence in A. buergerianum was shorter because of the 91-bp deletion (Figures 9(a) and 9(b)). As presented in Figure 9(a). A. tonkinense, A. ginnala, A. negundo, A. henryi, A. truncatum, and A. buergerianum yielded amplicons of 1,324, 1,320, 1,324, 1,326, 1,334, and 1,235 bp, respectively. Two poly-thymine repeats were identified in the sequenced fragments. Interestingly, A. truncatum had an 8-bp insertion that was lacking in the other species. Other deletions are listed in Supplementary Table S6. The predicted sizes of the indels were consistent with the sizes of the fragments amplified from the 24 samples analyzed in this study. Indel markers have commonly been used to distinguish closely related species in previous studies [22, 23]. However, Acer species have not been identified using this approach. Thus, indel markers may represent an important resource for identifying species. The ycf4-cemA indel marker developed in this study may be applicable for species classifications and the identification of Acer species.

Data Availability

The Acer truncatum chloroplast genome sequence was deposited in the GenBank database (accession MH638284).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Experimental design, Q.-Y.M.; collection and identification of plant materials, Y.-N.W., L.Z., S.-X.L., J.W., S.-X.L., Q.-Z.L, and K.-Y.Y.; genome analysis, Q.-Y.M., Y.-N.W., and C.-W.B.; manuscript draft preparation, Q.-Y.M.; manuscript review, Q.-Y.M., S.-X.L., and Q.-Z.L. All authors contributed to the experiments and approved the final manuscript.

Acknowledgments

This research was supported by the Natural Science Foundation of China (31700628), the Natural Science Foundation of Jiangsu Province (BK20170602), the Independent Innovation Fund Project of Agricultural Science and Technology in Jiangsu Province (CX[17]1004), and the Technology Innovation and Extension Project of Forestry Science in Jiangsu Province (LYKJ[2018]14). We thank Liwen Bianji, Edanz Editing China (http://www.liwenbianji.cn/ac) for editing the English text of a draft of this manuscript.

Supplementary Materials

Supplementary Table S1: PCR-based sequence validation of junctions between the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IRa and IRb) regions of the A. truncatum chloroplast genome. Supplementary Table S2: Long repeat sequences in the A. truncatum chloroplast genome. Supplementary Table S3: Details regarding the tandem repeats in the A. truncatum chloroplast genome. Supplementary Table S4: Distribution of SSRs in the A. truncatum chloroplast genome. Supplementary Table S5: Details regarding the chloroplast genome sequences used for the phylogenetic analysis. Supplementary Table S6: The sequences variability of sequenced fragments. (Supplementary Materials)