Abstract

Chenopodium quinoa is an Andean species of great interest because of its excellent nutritional quality and great adaptability to different environmental conditions. In addition, the high phenotypic diversity has caused difficulties in the correct taxonomic identification, and there are few studies on the phylogenetic relationships of quinoa in Colombia. Therefore, the objective of this research was to determine the phylogenetic relationships of quinoa with the matK and rcbL chloroplastid genes to characterize the genetic diversity in Colombian quinoa. Evolutionary analyses were performed using nucleotide substitution rates, pattern, base composition, and phylogeny construction. The rbcL gene presented approximately 1344 bp, and matK had 646 bp, which were translated into 434 and 215 amino acids, respectively. The nucleotide composition of the genes showed high percentages of similarity and identity with the Chenopodium quinoa sequences registered in GenBank and BOLD. Similar phylogenetic trees were obtained with the rbcL and matK genes, and both concatenated sequences grouped the accessions into clades. The results showed that Colombian quinoa has low rates of genetic differentiation that may be due to the domestication processes of the species, the lack of certified seeds, and the constant exchange of seeds between farmers in the principal producing areas of the Andean region.

1. Introduction

The cultivation of quinoa (Chenopodium quinoa Willd.) is vital to feeding Andean communities; however, cultivation has increased notably in regions of North America, Europe, Africa, and Asia because of its great adaptability to different environmental conditions [1]. Additionally, the quinoa grain has a high nutritional value and contains all essential amino acids, which is why it is considered a high-quality protein food global production system [2]. Studies on the morpho-agronomic characteristics of quinoa varieties in countries such as Colombia, Peru, and Bolivia have shown high phenotypic variability [1, 35]. However, the phenotypic characterization of quinoa varieties has limitations because information is reduced, and the expression of quantitative traits is subject to strong environmental influences [6]. Morphological parameters, such as plant height, shape and size of the leaf or stem, flower characteristics, inflorescences, and grain, are not an identification method that differentiates Chenopodium species or varieties since most parameters vary with age, stage of development, and environmental conditions.

An alternative for the identification of plant species and varieties is the use of molecular markers or DNA barcodes that provide more information on inter- and intraspecific variation of the species [7, 8]. Many studies have shown that DNA barcoding is an effective tool for plant identification [912]. Likewise, given the high conservation of its sequence, compact size, lack of recombination, and maternal inheritance, the chloroplast genome has been used for the generation of genetic markers and for phylogenetic classification [13], genetic divergence [14], and DNA barcoding systems for molecular identification [15]. Commonly studied chloroplast coding genes include atpF-atpH, matK, rbcL, rpoB, rpoC1, psbK-psbL, and trnH-psbA since they meet the requirements for use as DNA barcodes in plants [16]. Hollingworth et al. (2009) recommended the combination of two loci, ribulose-1,5-bisphosphate carboxylase (rbcL) and maturase (matK), as a plant barcode based on assessments of recoverability, sequence quality, and levels [17]. For species discrimination, this two-locus barcoding can identify species and contribute to the discovery of new species [18, 19].

Studies of these two genes in Chenopodium showed the paraphyletic origin of this genus [6]. An accurate assessment of the magnitude and organization pattern of the genetic diversity of individuals or conserved material of Colombian quinoa is essential to understanding the genetic structure and reliably identifying each genotype, elucidating the variability of quinoa cultivars [20], which will allow adequate use of the potential of varieties along with their conservation. Therefore, understanding the phylogenetic relationships of Colombian quinoas could be an important tool in the search for unique adaptive production, quality, and resistance characteristics for breeding programs that seek to further explore the genetic background of this species [13, 21]. The objective of this study was to understand the phylogenetic relationships between Colombian quinoa with matK and rcbL markers to establish strategies in breeding and conservation programs that lead to more efficient selection processes and the identification of quinoa germplasm of agronomic, industrial, or pharmaceutical interest.

2. Materials and Methods

2.1. Plant Material

Thirty-two accessions from the Quinoa seed collection of the Biotecnología Vegetal de la Gobernación de Boyacá were evaluated (Table 1), of which nine seedlings were planted with a randomized complete block design (RCBD) under greenhouse conditions in the city of Tunja, located at an altitude of 2690 meters above sea level, with an average temperature of 13°C, relative humidity of 78%, and 12 : 12 photoperiod. Of the nine plants planted per accession, a random sample was taken for sequencing the chloroplastic genes.

2.2. DNA Extraction and PCR Amplification of Specific Regions of the Chloroplast

DNA was isolated from young leaves of each of the quinoa accessions using the modified method of Dellaporta et al. [22], visualized on 0.8% agarose gels in a Maxicell Primo EC-340 electrophoresis chamber. The concentration was determined by spectrophotometry in a Biotek EPOCH 2 device, followed by dilution using HPLC water to a total volume of 100 µl at 10 ng/µl, stored at −20°C. For the PCR amplification of the rbcL and matK regions of the chloroplast, the primers (5′-ATTATACTCCTGAGTATGA-3′)—jrR (5′-ACT​CCA​TTT​GCT​AGC​TTC-3′) and matKF (5′-CTATATCCACTTAGTATT-TTCAGGAGG′)—matKR (5′-AAA​GTT​CTA​GCA​CAA​GAA​AGT​CGA-3′) were used, respectively, according to [6]. The amplification cycles were initial temperature at 95°C for 1 min, followed by 35 cycles of denaturation at 94°C for 30 seconds. The annealing temperature was 54°C for both primers, and the extension was at 72°C for 1.5 min. The amplified products were separated on 1.2% agarose gels at 180 volts for 1.5 hours in a Maxicell Primo EC-340 electrophoresis gel system chamber and stained with ethidium bromide.

2.3. Sequencing and Sequence Analysis

Amplicon sequencing was carried out with capillary electrophoresis on an ABI PRISM 3500 XL sequencer. The editing, assembly, and alignment of the sequences were done with BioEdit v.7.0.9.0 and MEGA-X [23]. All sequence data were uploaded to the National Center for Biotechnology Information (NCBI), and the access codes of the 62 sequences are OQ148485-OQ148548 (https://www.ncbi.nlm.nih.gov/popset/?term=OQ148485). Sequences were individually subjected to BLAST searches for similarity with sequences in GenBank databases and were further evaluated on the Barcode of Life Data System v4 (BOLD) platform. Multiple alignments of the sequences were produced with MUSCLE and CLUSTAL using MEGA-X to identify SNPs in the aligned sequences. For the general characterization of the sequences, the conserved, variable, informative parsimony, and singleton sites were evaluated, the nucleotide composition of the sequences by accession was estimated, and the pattern and nucleotide substitution rates of the matK and rbcL regions were determined. To evaluate the evolutionary distances, the model with the best fit was determined, and three phylogenetic trees were generated with the selected model, the first using the matK region, the second using the rbcL region, and the third concatenating the two matK and rbcL regions with the neighbor-joining tree algorithm, the Jukes–Cantor sequence distance estimation model, and the bootstrap of 1000 replicates.

3. Results

The rbcL and matK chloroplast regions of 32 Chenopodium accessions from central-eastern Colombia were amplified to generate molecular profiles that discriminated between C. quinoa and other Chenopodium. The nucleotide sequences of the amplicons were deposited in the NCBI GenBank and can be accessed under the assigned accession numbers (Table 1). Multiple alignments of the nucleotide sequences representing the rbcL and matK regions of the chloroplast identified the size of the chloroplast matK gene at ∼650 bp and the size of the rbcL gene at ∼1350 bp. The 32 matK gene sequences showed that each sequence had an average of 646 bp, with 56.96% variability, with a total of 260 conserved sites, 77 informative sites by parsimony, and a transition/transversion ratio of 0.69. The rbcL gene sequences, despite the fact that they presented a low percentage of variability of 32.88%, had a higher PIS (parsimony-informative site) than matK with 85 sites and a transition/transversion ratio of 0.82 (Table 2). A total of 434 amino acids were translated from the rbcL gene sequences, while a total of 215 amino acids were translated with matK.

The sequences of the rbcL and matK genes of the Colombian quinoas evaluated in this study presented different nucleotide compositions (Table S1). The nucleotide composition showed that the matK gene, on average, consists of 36.0% thymine (T), 18.3% cytosine (C), 29.7% adenine (A), and 16.0% guanine (G) while the nucleotide composition of rbcL was on average 29.0% thymine (T), 19.8% cytosine (C), 27.0% adenine (A), and 24.2% guanine (G).

The accessions for the matK gene showed varied compositions, and the Q19-5 sample had different compositions than the other accessions with 29.7% thymine (T), 15.7% cytosine (C), 36.8% adenine (A), and 17.8% guanine (G). When the sequences obtained from the rbcL gene were compared, the composition of the nucleotides was similar, with percentages close to the average for the genes. Only accession Q9-2 had a sequence size of 518 bp, which represented approximately half the base pairs of the average sequence. The %GC was higher in rbcL than in matK, with 42.3%.

To find the best model of the evolutionary distances and phylogeny based on the two genes, the models with lower BIC (Bayesian information criterion) scores were considered the best description of the nucleotide substitution pattern during evolution, according to [22]. The analyses of the 32 nucleotide sequences showed that the best evolutionary model for the matK gene sequences was 3-parameter Tamura (T92 + G + I), with 65 parameters and the lowest BIC value, 5973.62. For the rbcL gene, the best-fit model was a 3-parameter Tamura (T92), with 63 parameters and a BIC value of 7875.96. For the concatenated matK and rbcL sequences, the 3-parameter Tamura (T92 + G + I) model had 65 parameters and a BIC value of 25590.22.

According to the Tamura 3-parameter model and matK gene sequences, the nucleotide substitution rate ranged from 0.05 (AC, TG, CG, and GC) to 0.14 (CT and GA). For the rbcL gene, the substitution rates ranged between 0.06 (AC, TG, CG, and GC) and 0.14 (CT and GA), and, for the matK and rbcL concatenated genes, it was between 0.06 (AC, TG, CG, and GC) and 0.13. (CT and GA) (Figure 1).

The nucleotide substitution pattern was evaluated at complete codon positions (1st + 2nd + 3rd nucleotide) and according to the best-fit model of the matK and rbcL sequences, which is shown in Table 3. The transitional substitution was greater than the transversional substitution in the matK and rbcL regions. However, the matK region exhibited a higher substitution rate from C to T. In contrast, the frequency of change from G to A, T to C, and A to G in the rbcL region was higher than that of matK.

The concatenated sequences of matK and rbcL were analyzed using BLAST to search for homology with the sequences reported in GenBank, where 96% of the sequences presented 99% identity with the species Chenopodium quinoa Willd. except for accession Q9-2, which presented 98.37% identity with Chenopodium desiccatum; however, this same accession presented 98.37% similarity with Chenopodium atrovirens on the BOLD platform. The BOLD results were different in three accessions, and the species returned from this database did not match those from BLAST. The Q8-3 accession showed 99.57% similarity with Chenopodium suecicum in BOLD, while, in BLAST, there was 99.83% identity with Chenopodium quinoa Willd. Accession Q23-3 presented 96.16% similarity with Chenopodium desiccatum in BOLD (Table 4).

The results of the sequence analyses were consistent with the phylogenetic reconstruction from the matK and rbcL sequences, which discriminated accessions based on the number of base substitutions per site, as evidenced by the tree topologies of the sequences obtained with the neighbor-joining method (Figures 2(a) and 2(b)), which showed that some accessions had high genetic variability and a low index of genetic divergence.

The phylogenetic trees obtained with the sequences of the two genes analyzed independently showed similar groupings in some accessions in the two trees; for example, the grouping for accessions Q30-8 and Q4-6 did not correspond to the place of origin.

Finally, the phylogenetic tree obtained by concatenating the matK and rbcL regions conformed to the tree built by the matK segment; that is, there was little variation in the conformation of the groups, except for slight changes in the branch supports (Figure 3). Comparing these results with those obtained showed similarity and coherence in the distribution of the groups.

4. Discussion

Genetic variation is important for the survival of plants since it leads to adaptation and evolution [23]. Likewise, studies on genetic diversity among cultivated plant species are essential for conservation and improvement programs [24]. This study, where 32 Chenopodium accessions were analyzed by evaluating matK and rbcL genes, showed that these regions contain higher amounts of adenine + thymine (A + T) than guanine + cytosine (G + C) and that nucleotide substitution rates were high, as were genetic distances, demonstrating that the use of chloroplast molecular markers plays an important role in elucidating evolutionary relationships and identifying species, such as quinoa [25].

Therefore, it is essential to understand the evolutionary causes of nucleotide changes and their patterns of variation between species since this generates genetic divergence [26]. In this study, the matK region exhibited a higher nucleotide substitution rate than the rbcL region. Ho et al. showed that the rbcL gene has fewer variations than other chloroplast genes [11]. In addition, this gene presents more universality in the primers than in the matK gene. The latter has combinations of several primers to achieve amplification of the region [27, 28]. However, the universality of primers is an important factor in the identification of unknown species because primers are more effective if they can be used to amplify different types of plants [29]. In this study, the sequencing of both matK and rbcL genes showed effectiveness in the identification of species.

The analysis of the nucleotide composition of the sequences showed that matK was the region with the greatest variability or polymorphic sites with respect to the rbcL gene. The latter region evolves slowly and is, therefore, used more frequently in phylogenetic analyses [30]. In addition, the rbcL gene is a more universal region and is less efficient in separating closely related taxa [16], although some studies on angiosperm species have shown that this region presents enough variation to distinguish species [31]. Meanwhile, matK, which presents a faster evolution, is widely used for phylogenetic analyses. In this study, the analysis of the two regions and the analysis of the translation revealed that the nucleotide composition was similar to Chenopodium quinoa.

Searching the BOLD and GenBank databases of the concatenated matK and rbcL genes showed 98 to 100% agreement with Chenopodium quinoa sequences; however, some sequences coincided with C. desiccatum, C. atrovirens, and C. suecicum, and this coincides with what was obtained by [32] using Bayesian and maximum parsimony analysis of the noncoding trnL-F (cpDNA) and nuclear ITS regions, where tetraploid species of the genus Chenopodium such as C. berlandieri subsp. nuttalliae and C. quinoa belong to Chenopodium sensu stricto. Therefore, the DNA barcoding used in this study was able to distinguish intraspecific and interspecific divergences because, among other things, of the fact that both regions have high amplification and sequencing success rates [33].

Studies on C. murale obtained similar results using the matK and rbcL genes, showing 100% match with C. murale using BLAST, while, in BOLD, the rbcL gene showed a high degree of similarity with several taxa, including C. ambrosioides, C. album, and C. ficifolium, ranging from 96.3 to 100% [11]. Research by Huang et al. [34] showed that the combination of the rbcL and matK genes separated 40% of the sampled species in the combined data set for the family Apiaceae. Therefore, these two regions, matK and rbcL, have been shown to be ideal for species identification.

Quinoa is currently gaining international attention given the productive and nutritional potential of its seeds, which has caused a significant increase in planting areas. Because it is characterized by its primary and secondary germplasm, the authors of [14] reported the complete mitochondrial and chloroplast genome sequences of quinoa accession PI614886 and the identification of sequence variants in additional quinoa accessions and related species. This was the first reported mitochondrial genome assembly in the genus Chenopodium. The inference of phylogenetic relationships between Chenopodium species based on mitochondrial and chloroplast variants supports the hypothesis that the ancestor of the A genome was the cytoplasmic donor in the original tetraploidization event and that highland and coastal quinoas were generated independently, producing results such as those found in this study.

Ibrahim et al. 2019 characterized seven quinoa genotypes with different provenances and identified genetic polymorphisms and unique markers in each genotype with variation at the seed color level using ISSR, SCoT markers, and two chloroplastid markers (rbcL and rpoC1) [15]. Sequence alignment revealed that rbcL recovered from the quinoa genotypes with high similarity to other rbcL genes obtained from Chenopodium species in other studies, with similarities ranging between 76 and 80%. Furthermore, the rbcL gene showed highly consistent genetic similarity with low genetic evolution and mutation. On the other hand, the sequence alignment revealed that the rpoC1 obtained from the quinoa genotypes had high similarity with other rpoC1 genes obtained from other plant species in other studies. The similarities ranged from 80% to 81%. The evolutionary divergence in Chenopodium using rbcL gene sequences for 19 accessions found 0.68% diversity in interspecific sequences [6].

Studies on the Chenopodium genus have shown the ability of a single barcoding gene (rbcL) or a combination of these genes to distinguish individuals of the Chenopodium genus [15, 35]. In addition, phylogeny factors and the identification of associated gene families are essential tools for understanding the evolution, mutation, and genetic factors associated with the main characteristics of the Chenopodium genus [25].

For the conservation or genetic improvement of plants, phylogenetic studies represent the basis for inferring the evolutionary history of the species, their delimitation, genetic differentiation, and gene flow [36]. This information provides the necessary tools for conservation and input to predict the genetic diversity of species in breeding programs [37]. The reconstruction of the phylogenetic trees that took into account the matK region, rbcL region, and both concatenated regions in this study suggested that most of the quinoa accessions are closely related, which implies a close relationship that is associated with the reproductive system of the species and its coevolution processes in the Andean productive systems. Therefore, our results support future quinoa breeding and genetic conservation programs.

5. Conclusions

The present study demonstrated the efficacy of the matK and rbcL genes as reliable markers to support Chenopodium quinoa conservation and improvement programs, locally, nationally, and internationally. Although the results of the rbcL gene were less promising, the use of both genetic markers for the molecular identification of the Chenopodium genus is recommended as an effective, modern, and widely used method.

The phylogenetic trees obtained from the matK and rbcL gene analyses provided strong evidence of phylogenetic relationships between the evaluated accessions, where similar clades were established between the phylogenetic trees obtained with the matK gene and the tree where both matK and rbcL genes were concatenated. This finding implies that genetic divergence in quinoa is phylogenetically informative at the species level. In addition, the low rates of genetic differentiation may be due to the lack of certified seeds and the constant exchange of seeds between farmers in the principal producing areas of the Andean region. Therefore, DNA barcoding in Chenopodium quinoa is an appropriate tool for establishing the correct identification of this species, although more phylogenetic studies involving more species from the Chenopodium genus are needed to fully understand the phylogeny of quinoa. The composition and sequence data provide a database for other studies on quinoa diversity.

Data Availability

The data used for the elaboration of this manuscript are available upon request. All sequence data were uploaded to the National Center for Biotechnology Information (NCBI); the access codes of the 62 sequences are OQ148485-OQ148548 (https://www.ncbi.nlm.nih.gov/popset/?term=OQ148485).

Conflicts of Interest

All authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the “Patrimonio Autónomo, Fondo Nacional de Financiamiento para la Ciencia, la Tecnología y la Innovación Francisco José de Caldas-MinCiencias. Cód. 63924,” to the CIDE group “Competitividad, Innovación y Desarrollo Empresarial,” the Faculty of Agricultural Sciences of the Universidad Pedagógica y Tecnológica de Colombia.

Supplementary Materials

Table S1: nucleotide composition of matK and rbcL chloroplast genes of quinoa accessions from Colombia. (Supplementary Materials)