Abstract

Expressed sequence tags (EST) are potential source for the development of genic microsatellite markers, gene discovery, comparative genomics, and other genomic studies. In the present study, 7630 ESTs were examined from NCBI for SSR identification and characterization. A total of 263 SSRs were identified with an average density of one SSR/4.2 kb (3.4% frequency). Analysis revealed that trinucleotide repeats (47.52%) were most abundant followed by tetranucleotide (19.77%), dinucleotide (19.01%), pentanucleotide (9.12%), and hexanucleotide repeats (4.56%). Functional annotation was done through homology search and gene ontology, and 35 EST-SSRs were selected. Primer pairs were designed for evaluation of cross transferability and polymorphism among 11 plants belonging to five different families. Total 402 alleles were generated at 155 loci with an average of 2.6 alleles/locus and the polymorphic information content (PIC) ranged from 0.15 to 0.92 with an average of 0.75. The cross transferability ranged from 34.84% to 98.06% in different plants, with an average of 67.86%. Thus, the validation study of annotated 35 EST-SSR markers which correspond to particular metabolic activity revealed polymorphism and evolutionary nature in different families of Angiospermic plants.

1. Introduction

The flowering plants are extremely diverse in their morphology, growth habit, environmental adaptation, and nuclear genome content [1]. Plant genomes tend to be large and complex, varying in size from 125 million base pairs (Mbp) for Arabidopsis thaliana [2] to 124,852 Mbp for Fritillaria assyriaca [3]. Despite so much diversity, plants do exhibit conservation of both gene content and gene order [4]. This diversity in the genomes makes comparative studies involving data from smaller genomes important for accelerating the study of larger genomes. More interesting, it can relate evolutanory consequences of diverse plant taxa. Comparison could also be made about the conserved sequences and information on the regulatory elements for extending the genetic information from model to more complicated species [5, 6]. Moreover, comparative genetic analyses have shown that different plants species comprise homologous genes for very similar functions [1, 79].

The DNA based markers are routinely used in ecological, evolutionary, taxonomical, comparative biology, diversity, phylogenic and genetic studies [10]. Among all the markers, microsatellites are preferred in plant genetics due to their hypervariability, relative abundance, multiallelic nature, high reproducibility, codominant inheritance, high polymorphism, high transferability, chromosome-specific location, extensive genome coverage, and highly informative and wide genomic distribution [1012]. Microsatellites or simple sequence repeats (SSRs) are sequences in which one or few bases are tandemly repeated, ranging from 1 to 6 base pair (bp) long units which are dispersed randomly and ubiquitously throughout the genomes including both prokaryotes and eukaryotes [1315]. Microsatellites arose from ESTs called as EST-SSRs or genic SSRs that represent functional molecular markers as “putative function or particular enzymatic activity” that can be deduced by public data base through computational approaches.

With the development of functional genomics, a huge number of expressed sequence tags (ESTs) have been deposited in the public database (NCBI) [16]. An in silico approach, for retrieving EST sequences from NCBI, provides a potential source of EST-SSRs, and computational methods could assign putative functions of the ESTs to various metabolic pathways. SSRs in the transcribed region are expected to be more conserved, significant, and more transferable across taxonomic boundaries than anonymous SSRs [17, 18]. Thus, the development of SSR through searching the database of EST has become a fast, efficient, and low-cost option for many studies [12, 19, 20]. The assessments of EST-SSRs, in polymorphism, diversity, and transferability have been carried out in different plant species, namely, rice [21], grape [22], sugarcane [23], tomato [24], loblolly pine [25], barley [26], rye [27], cereals [28], leguminous and nonleguminous plants [29], medicinal plants [30], and the millet and nonmillet species [3133]. In the present study, 7630 EST sequences were retrieved from NCBI for SSR identification and characterization. Functional annotations of the sequences were assigned for the development of informative EST-SSR markers and assessment of their transferability in different families.

2. Material and Methods

2.1. Plant Material and DNA Extraction and Purification

Young juvenile, disease free, immature leaves from various therapeutic plants such as Datura metel, Datura innoxia, Withania coagulans, Withania somnifera, Capsicum annuum, Eclipta alba, Stevia rebaudiana, Citrullus colocynthis, Ocimum sanctum, Catharanthus roseus, and Moringa oleifera were collected from the University of Rajasthan campus. These plants belong to five distinct families. DNA was extracted from leaves using CTAB method [34]. DNA sample was treated with RNAase for 1 h at 37°C and purified by phenol extraction (25 phenol : 24 chloroform : 1 isoamyl alcohol, v/v/v) followed by ethanol precipitation [35] and stored at −80°C for long period. DNA was checked on a 0.8% agarose gel for confirmation of quality and concentration and final adjustments were made in 10 mM Tris HCl buffer to obtain the working concentration of 25 ng/μL.

2.2. Mining of EST Sequences, ESTs Assembling, and Microsatellites Identification

Total 7630 putative or enzyme-encoding EST sequences were retrieved as FASTA format from the National Center for Biotechnology Information (NCBI) of different plants sources because our selected plants do not have much sequencing data in public database. ESTs assembling were carried out using CAP3 programme through online web tool (http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::cap3), for identification of nonredundancy. Microsatellite identification was carried out using MISA (http://pgrc.ipk-gatersleben.de/misa/) software tool and criteria for SSRs detection were 6, 4, 3, 3, and 3 repeat units for di-, tri-, tetra-, penta-, and hexanucleotides, respectively. SSR primer pairs (forward and reverse) were designed for the selected sequence using online web tool batch primer 3 from the flanking sequences of the identified microsatellite motifs [36].

2.3. EST-SSR Sequences Annotation

To decipher informative assessment of SSR containing ESTs was done using Blastn/Blastx analysis for homology search and the nonredundant protein (NR) at the NCBI and functional annotation pipeline was also run at FastAnnotator (http://fastannotator.cgu.edu.tw/) for gene ontology (GO) system to the different GO functional classes that were displayed as horizontal bar chart in addition to detailed chart [37].

2.4. PCR Amplification and Electrophoresis

PCR reaction was carried out in a total of 10 μL volume containing 25 ng template DNA, 1.0 μL of each forward and reverse primers (at a concentration of 10 pmole/μL) [3133], 0.2 μL of 100 mM of dNTPs, 0.5 U of taq DNA polymerase, 1.0 μL of 10X PCR buffer, and 2.5 mM of MgCl2. Amplification was performed in a thermal cycler (Bio Rad, UK) in the following conditions: initial denaturation at 94°C for 5 min followed by 30 amplification cycles for denaturation for 1 min at 94°C followed by annealing for 1 min then extension for 2 min at 72°C; final extension at 72°C for 7 min was allowed. The PCR conditions particularly the annealing temperatures (varying from 52°C to 58°C) for each primer were standardized (Table 1). All the designed primers were surveyed in the selected plants, for 2-3 times, and amplified products were stored at 4°C. PCR products were used for electrophoresis on 1.5% high resolution agarose gel (Merk bioscience) at 70 V for approximately 3.5 hours, made in 0.5X TBE (Tris-Borate-EDTA) buffer. Ethidium bromide was used in agarose gel electrophoresis as intercalating dye then gel was subjected to photograph under UV light.

2.5. Genetic Relationship with EST-SSR Primer

Amplified bands were scored as binary data in the form of present (1) or absent (0). Dendrogram was constructed by neighbor-joining and Jaccard’s algorithm using free tree/tree view free software [38, 39]. The polymorphism information content (PIC) values were calculated for each primer by using the online resource of PIC calculator (http://www.liv.ac.uk/~kempsj/pic.html).

3. Results

3.1. Frequency of Microsatellites in Expressed Sequence Tags

A total of 7630 EST sequences of putative function (enzyme-encoding sequences) involved in different plant metabolic pathways were retrieved from NCBI for microsatellite (SSR) identification. Nonredundant 1749 (1117 kb) sequences were identified comprising 884 contigs and 865 singlets, in which 263 SSRs were having 220 perfect SSRs, 38 sequences containing more than 1 SSR, and 26 SSRs present in compound formation. The frequency of EST-SSR was 3.4% or density was one SSR per 4.2 kb. Among all SSRs, trinucleotide repeats were highly abundant (47.52%) followed by tetranucleotide (19.77%), dinucleotide (19.01%), pentanucleotide (9.12%), and hexanucleotide (4.56%) repeats. A total of 58 different types of motifs were identified which belonged to three different types of dinucleotides repeats, nine different types of trinucleotides, sixteen different types of tetranucleotides, eighteen different types of pentanucleotides, and twelve different types of hexanucleotide repeats. The most frequent repeat motifs were AG/CT and AT/AT in dinucleotide, motifs AAG/CTT, CCG/CGG, and AGC/CTG in trinucleotide, motifs AAAT/ATTT and AAAG/CTTT in tetranucleotide, and motif AAAAC/GTTTT in pentanucleotide (Figure 1).

3.2. Expressed Sequence Tags (ESTs) Annotation and Primer Designing

EST sequences, from which the SSR markers developed, were examined by functional annotation (blastn/blastx/gene ontology) and to identify 35 EST-SSR markers, on the basis of their presence in primary metabolic process, secondary metabolic process, biosynthetic process, nitrogen compound metabolic process, oxidation-reduction process, transferase activity, oxidoreductase activity, lyase activity, nucleotide binding activity, and others (Figure 2). Primer pairs could be designed for functionally annotated 35 EST-SSRs that were 13.30% of the total microsatellites (263) identified and evaluated for polymorphic nature, cross transferability, and genetic relationships in 11 plant species of five different families. Trinucleotide repeats were highly abundant in 35 EST-SSRs followed by tetra- and dinucleotide repeats (Table 1). All these were associated with common metabolic pathways such as GO:0009813 flavonoid biosynthetic process, GO:0045430 chalcone isomerase activity, GO:0016114 terpenoid biosynthetic process, GO:0004452 isopentenyl-diphosphate delta isomerase activity, GO:0046653 tetrahydrofolate metabolic process, GO:0004489 methylenetetrahydrofolate reductase (NADPH) activity, GO:0006694 steroid biosynthetic process, GO:0008483 transaminase activity, GO:0000162 tryptophan biosynthetic process, GO:0006571 tyrosine biosynthetic process, GO:0009094 L-phenylalanine biosynthetic process, GO:0006633 fatty acid biosynthetic process, GO:0009809 lignin biosynthetic process, GO:0009695 jasmonic acid biosynthetic process, GO:0004310 farnesyl-diphosphate farnesyltransferase activity, GO:0004311 farnesyltranstransferase activity, GO:0004713 protein tyrosine kinase activity, GO:0045548 phenylalanine ammonia-lyase activity, GO:0009821 alkaloid biosynthetic process, and GO:0006695 cholesterol biosynthetic process (see supplementary table available online at http://dx.doi.org/10.1155/2014/863948).

3.3. Amplification and Polymorphism of Annotated EST-SSR Markers in Selected Plants

A set of 35 primer pairs from different microsatellites in EST was tested for PCR optimization, characterization, and amplification with 11 plants belonging to different families. All markers produced polymorphic amplification profile in selected plants (Figure 3), which ranged from 50 to 1050 bp. DNA finger printing data of 35 EST-SSR with eleven plants revealed a total of 402 alleles at 155 loci with an average of 2.6 alleles per locus. The markers designed in this study had potential of showing polymorphism among different plants and the polymorphic information content (PIC) of 35 EST-SSR ranged from 0.15 to 0.93 with an average 0.77.

3.4. Cross Transferability

All 35 annotated EST-SSR markers were assessed for cross transferability in the selected plants. The cross transferability of these markers was found to be 86.45% in Datura metel, 81.29% in Datura innoxia, 96.77% in Withania coagulans, 98.06% in Withania somnifera, 85.16% in Capsicum annuum, 34.84% in Stevia rebaudiana, 49.68% in Eclipta alba, 54.19% in Citrullus colocynthis, 43.23% in Ocimum sanctum, 58.71% in Catharanthus roseus, and 58.66% in Moringa oleifera, with an average of 67.86% (Table 2). These markers were found to be more transferable in Solanaceous plants (Datura metel, Datura innoxia, Withania coagulans, Withania somnifera, Capsicum annuum), ranging from 81.29% to 98.06% with an average of 89.55% as compared to other plants showing variable transfer rates. Thus, all markers showed reliable amplification pattern in different plants and were scored as transferable.

3.5. Genetic Diversity Analysis by EST-SSRs

Genetic relationship among selected plants was further analyzed by construction of dendrogram through allelic data obtained from EST-SSR primer amplification. All the plants were grouped into two major clusters. Cluster I contained 5 plants of Solanaceae family with two subgroups (Ia and Ib); each subgroup comprised same genus plants clustered together (Datura metel, Datura innoxia (Ia) and Withania coagulans, Withania somnifera (Ib)). Cluster II contained 6 plant species classified into two major subgroups (IIa and IIb). Subgroup IIa comprised Asteraceous plants (Eclipta alba and Stevia rebaudiana) clustered together and subgroup IIb comprised four plant species into three separate edges of the dendrogram, exception with one plant (Figure 4). Thus, the annotated 35 EST-SSR markers showed discriminatory potential to some extent and showed close intimacy amongst Solanaceous and between Asteraceous plants.

4. Discussion

The present study intended to utilize publicly available EST sequences from different plant sources for functional annotation of EST sequences to decode informative EST-SSR markers using in silico approach. Experimental methods to develop SSR markers are laborious, time consuming, and expensive; therefore use of publicly available EST libraries which reduce time and expenses is now being used as an alternative for marker identification [16, 20, 40, 41]. We identified nonredundant 263 microsatellites having di-, tri-, tetra-, penta-, and hexanucleotide repeats. The SSR frequency in the ESTs collection was 3.4% which is close to earlier reports in other plants species, namely, 3.4% in Physcomitrella patens and 3.5% in Oryza sativa [20] and 3.2% in cereals [42] and 4.1% in almond [43]. Other studies also reported SSRs in various frequencies, namely, 2.5% in grapes [22], 2.88% in sugarcane [23], 4.7% in rice [44], and 2.8% in barely [45]. In general, about 5% of ESTs contained SSRs in diverse plant species [46]. The differences in the frequency of EST-SSRs could be attributed to the “search criteria” used, type of SSR motif, size of sequence data, and the mining tools used [31, 47]. An average density of one SSR per 4.2 kb was detected which is closely comparable to earlier reported in date palm [48] and in cereals [42].

Among 263 microsatellites, trinucleotide repeat motifs were the most abundant, with a frequency of 47.52% followed by tetra- (19.77%), di- (19.01%), penta- (9.12%), and hexanucleotide (4.56%) repeats. Varshney et al. [42] reported that trinucleotide repeats (TNRs) are the most common, followed by either dinucleotide repeats (DNRs) or tetranucleotide (TTNRs) repeats. Our result of trinucleotide repeat frequency is in close agreement with previous studies reporting 48.5% in sugarcane [49] and 48% in Setaria italic [32]. Some other studies also reported high TNRs, namely, cereals [50], Ricinus communis [51], Eucalyptus globulus [52], sugarcane [12], and Setaria italica [31]. The reason for the abundance of trinucleotide repeats in plants might be attributed to absence of frameshift mutations [53]. Among all types of trinucleotide motifs, AAG/CTT, CCG/CGG, and AGC/CTG were in high proportion. Motifs GGA/TTC, CCT/AGG, GAA/TTC, and CCG/GGC were also detected. These motifs can form hairpin-like structures, which stabilize and allow them to escape from repair mechanisms [15, 54]. Each trinucleotide motif encodes a particular amino acid including stop codon which participates within protein in various metabolic activities [20, 55]. Predictable, twenty different types of amino acids were detected in trinucleotide motifs including one stop codon (Figure 5). Amino acids (leucine, serine, alanine, and arginine) encoded by trinucleotide motifs are in agreement with earlier studies [20, 30, 55, 56].

According to functional annotation, 35 EST-SSRs were identified due to their direct involvment in metabolic pathways through blastn/blastx and gene ontology (GO). As observed in earlier studies, relavant transcripts were detected using functional annotation pipelines for various applications [57]. Most of these were involved in biological processes and molecular function such as primary metabolism, secondary metabolism, nitrogen compound metabolism, oxidation-reduction process, and transferase activity. The 35 EST-SSR primer pairs were designed and surveyed in different plants. All primers produced clear PCR amplification profiles in all the selected plants and produced 402 alleles at 155 loci with an average of 2.6 alleles/locus. This result is in close agreement with earlier study reported in chickpea (2.6 alleles/locus) [58].

A set of 35 EST-SSR markers produced a clear amplification profile and these were found to be transferable among the selected plant species. The frequency of cross transferability ranged from high in W. somnifera (98.06%) to a low (34.84%) in S. rebaudiana with an average of 67.86%. This result is in conformity with earlier report on cross transferability of Medicago truncatula EST-SSRs into four leguminous and 3 non-leguminous plants [29]. The transferability (70%) of castor bean SSRs was reported in J. curcas and other Jatropha species [59]. Mishra et al. [30] reported cross transferability (31–57%) of Madagascar periwinkle EST-SSR markers in other medicinal plants. Choudhary et al. [58] also observed cross transferability (68.3% to 96.6%) of chickpea EST-SSR marker across 6 annual Cicer species and also reported 29.4% to 61.7% transferability in seven legume genera. Foxtail millet derived EST-SSR markers showed cross transferability of approximately 85 to 89% in different types of millets and nonmillets [3133]. Saha et al. [60] also reported approximately 92% transferability from tall fescue to 7 grass species. Some other higher level of transferability was reported in other studies, namely, 86.6% transferability of wheat EST-SSRs to other cereal plants [28], 96.5% cross species amplification among 22 Gossypium species [61], 95.2% cross transferability between Saccharum complex and cereals [49], and 90% transferability of Vigna radiata derived EST-SSR in other Vigna species [62]. Some Lower frequency of transferability was also reported in earlier studies. Gutierrez et al. [63] reported that approximately 40.6% transferability of Medicago truncatula EST-SSR markers amplified across 3 pulse crops (faba bean, chickpea, and pea). In this study, 35 EST-SSR markers were found to be more transferable (89.54%) among Solanaceous plant species than other plant taxa and these markers can give credence to various genetic applications in Solanaceous plants.

Further, the genetic relationships among the eleven plants species were evaluated by construction of dendrogram (neighbor-joining/jaccard’s algorithm) using allelic data amplified through 35 EST-SSR markers. Here these markers showed close intimacy amongst Solanaceous plants (D. metel, D. innoxia, W. coagulans, W. somnifera, and C. annuum) and between Asteraceous plants (E. alba and S. rebaudiana) and also showed discrimination to some extent in other selected plants (C. colocynthis, O. sanctum, C. roseus, and M. oleifera). Similar relationship was shown by Gupta and Prasad [29] who evaluated the genetic relationships between leguminous (M. truncatula, lentil, pea, and chickpea) and nonleguminous plants (A. thaliana, tomato, wheat). Some other studies also reported genetic relationships using EST-SSR markers in other plant species such as in bread wheat [50], Grasses [60], sugarcane [49] and millets and nonmillets [3133].

5. Conclusion

This study revealed the insight of abundance and distribution of microsatellites in the expressed sequence tags, retrieved from public data base. Further, functional annotation was feasible to develop and select the informative EST-SSR markers for various genomic applications. This is a bypass approach to reduce cost and time and it is an efficient way to analyze the transcribed portion of genome besides development of own libraries. Finally, 35 EST-SSR markers were developed and experimentally validated for their polymorphic nature, cross transferability, and genetic relationship in eleven different plants species. On the basis of amplification profiles, all these markers were found to be transferable. Genetic relations were established to unambiguously differentiate selected plants species.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors are grateful to the Department of Biotechnology under the IPLS programme for the financial support. Facilities provided by BIF are also gratefully acknowledged. The authors are thankful to Dr. Varsha Khurana and Dr. Ritika Bhatt for their help in writing the paper.

Supplementary Materials

Supplementary Table: The complete details of most promising hits of gene ontology of 35 EST-SSRs given in the supplementary table.

  1. Supplementary Table