Research Article | Open Access
De Novo Sequencing and Characterization of the Transcriptome of Dwarf Polish Wheat (Triticum polonicum L.)
Construction as well as characterization of a polish wheat transcriptome is a crucial step to study useful traits of polish wheat. In this study, a transcriptome, including 76,014 unigenes, was assembled from dwarf polish wheat (DPW) roots, stems, and leaves using the software of Trinity. Among these unigenes, 61,748 (81.23%) unigenes were functionally annotated in public databases and classified into differentially functional types. Aligning this transcriptome against draft wheat genome released by the International Wheat Genome Sequencing Consortium (IWGSC), 57,331 (75.42%) unigenes, including 26,122 AB-specific and 2,622 D-specific unigenes, were mapped on A, B, and/or D genomes. Compared with the transcriptome of T. turgidum, 56,343 unigenes were matched with 103,327 unigenes of T. turgidum. Compared with the genomes of rice and barley, 14,404 and 7,007 unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively. On the other hand, 2,148, 1,611, and 2,707 unigenes were expressed specifically in roots, stems, and leaves, respectively. Finally, 5,531 SSR sequences were observed from 4,531 unigenes, and 518 primer pairs were designed.
Due to the high-thousand kernel weight, elongated and plump kernels, high Zn, Fe, and Cu concentrations in seeds , high amylose content in seeds , and alternatively dwarfing genes [3, 4], polish wheat (, AABB, Triticum polonicum L.) attracts the interest of producers and breeders . However, polish wheat may be a hybrid of Triticum ispahanicum H. and T. durum (, AABB) [5, 6]. The genetic background of polish wheat, especial Chinese polish wheat, is low similarity with T. durum, T. turgidum (, AABB), and T. aestivum (, AABBDD) [7, 8]. It is therefore inappropriate to reveal the genetic information of polish wheat using the genome or transcriptomes of T. turgidum and T. aestivum [9–12].
With advances in next-generation sequencing technology, RNA sequencing (RNA-Seq), with high throughput, produced sequences and then mapped them on a reference genome, or de novo assembles a better depiction of transcriptome [9, 10, 13–15] and has been/is being widely used in model organisms and nonmodel organisms to study biological processes and applications, such as SNP and gene discovery, SSR mining, and identification of differentially expressed genes [15–17]. Although the draft genome and transcriptome of T. aestivum and the transcriptome of tetraploid wheat were released [9–12], transcriptome information of polish wheat is not constructed and reported. Construction as well as characterization of a polish wheat transcriptome, therefore, is a crucial step to study useful traits in polish wheat.
Dwarf polish wheat (DPW) with a recessive dwarfing gene  was originally collected from Tulufan, Xinjiang province, China. Therefore, the genetic similarity between DPW and T. durum, T. turgidum, and T. aestivum should be low [7, 8]. In this study, the transcriptome of DPW was constructed and characterized. Additionally, the transcriptome was compared with the genomes of barley, rice, and comment wheat and the transcriptome of T. turgidum. Finally, some SSR markers were mined.
2. Materials and Methods
2.1. Raw Reads
10 DPW raw reads databases contained 697.13 million 100 bp paired-end raw reads that were downloaded from the NCBI sequence read archive (SRA) database. Among these raw reads databases, 370.82, 115.51, and 210.80 million reads were generated from roots (SRA numbers: SRR2973581, SRR2973582, SRR2973583, and SRR2973584; unpublished data), stems (SRA numbers: SRR2969441 and SRR2969444; ), and leaves (SRA numbers: SRR2973592, SRR2973593, SRR2973594, and SRR2973595; unpublished data), respectively. Roots (four samples) were collected from seedlings; stems (two samples; ) and leaves (four samples) were collected at the booting stage. All these 10 samples were sequenced by our laboratory using the 100 bp protocol on Illumina Hiseq 2000 platform. All sequenced information was briefly described as Wang et al. .
2.2. Transcriptome Assembly and CDS (Coding Sequence) Prediction
Reads containing adapters, poly-N, and low quality reads were removed using Novogene-written perl scripts to produce clean reads. Meanwhile, GC content and sequence duplication level of the clean data were calculated. All unigenes were assembled using the software of Trinity (V2012-10-15)  with minimum -mer coverage of 2, and other parameters were default. Unigenes were defined using the methods of Zhang et al.  and Krasileva et al. .
2.3. Gene Functional Annotation
The functions of unigene were annotated using a series of databases, including blastx against the NCBI nonredundant protein (Nr), NCBI nucleotide collection (Nt) and Swiss-Prot databases with 10−5 as an -value cutoff, and hmmscan against protein family (Pfam). Functional categories of unigenes were grouped using Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/), Clusters of Orthologous Groups of Proteins database (KOG/COG, http://www.ncbi.nlm.nih.gov/COG/), and Gene Ontology (; http://www.geneontology.org), respectively.
2.4. Tissue-Specific Expression Analysis
Clean reads were aligned against assembled transcriptome to produce read count using the package of RSEM . The read count of each unigene was converted into RPKM values for normalizing gene expression using the RPKM method . If the value of RPKM was 0 (N/A), the unigene was not expressed. Tissue-specific unigenes were selected out according to RPKM values of unigenes among roots, leaves, and stems.
2.5. Comparative Genomics Analysis
All unigenes were blasted against draft wheat genome  with -value < 10−5, coverage > 90%, and alignment length > 200 bp. All unigenes were also blasted against the transcriptome of T. turgidum  with -value < 10−5.
Peptide sequences of barley were obtained from the website http://plants.ensembl.org/hordeum_vulgare/Info/Index , and peptide sequences of rice were obtained from the website http://plants.ensembl.org/Oryza_sativa/Info/Index . Sequence alignments were performed using blastx with -value < 10−5, alignment length > 100, and identity > 80%.
2.6. SSR Mining and Primer Design
SSR sequences (SSRs) were observed using the software of MIcroSAtellite (MISA, http://pgrc.ipk-gatersleben.de/misa/) as described by Zhang et al. . The SSRs were considered to contain motifs with one to six nucleotides in size and a minimum of 5 contiguous repeat units. Based on these SSRs, primers were designed using the software of Primer 3.
3. Results and Discussion
3.1. Sequencing and De Novo Assembly of the DPW Transcriptome
Although 697.13 million (370.82 in roots, 115.51 in stems, and 210.80 in leaves) 100 bp paired-end raw reads were generated from DPW, after cleaning and quality checks, 671.49 million (361.96 in roots, 108.11 in stems, and 201.32 in leaves) 100 bp paired-end clean reads were used for assembly. Finally, 76,014 unigenes (lengths of unigenes ranged from 201 to 19,201 bp) with mean sizes of 872 bp (Table 1, all assembled unigenes have been deposited at GenBank under the accession GEDT00000000) were assembled. The number of unigenes in this transcriptome was less than the transcriptome of T. turgidum which contained 140,118 unigenes with mean sizes of 1,299 bp  but was more than the transcriptome of T. turgidum cv. Langdon that contained 40,349 unigenes .
3.2. Functional Annotation of Unigenes
Among these 76,014 unigenes, 61,748 (81.23%) unigenes were functionally annotated in at least one database of the NCBI Nr, Nt, Swiss-Prot, KEGG, KOG, and COG using blastx with an -value below (the GenBank accession GEDT00000000). Of the 61,748 annotated unigenes, 11,207 (18.15%), 28,104 (45.51%), 6,830 (11.06%), 17,877 (28.95%), 22,930 (37.13%), 44,878 (72.68%), and 58,659 (95.00%) unigenes were classified into 26 COG categories, three GO functional categories [molecular function (15,684), biological process (4,637), and cellular components (7,783)], KEGG, KOG, pfam, Nr, and Nt, respectively. All annotated information was also deposited at GenBank under the accession GEDT00000000.
Previously well-studied transcriptomes reported that many unigenes were not functionally annotated, such as 30% in T. turgidum , 32.12% in peanut , and 45.10% in Dendrocalamus latiflorus . In this study, 14,266 (23.10%) unigenes were not functionally annotated in any database. As proposed by Krasileva et al. , these unigenes might be (1) wheat-specific genes or highly divergent genes; (2) expressed pseudogenes; (3) noncoding transcribed sequences; (4) pieces of 5′ and 3′ UTRs; and (5) general assembly artifacts. Absolutely, some of these unannotated unigenes, such as noncoding transcribed RNAs, also regulate various cellular processes or other regulations in wheat .
On the other hand, as the lengths of unigenes were longer, the annotated efficiencies were higher . In the present study, 99.67% of unigenes with more than 2,000 bp, 98.34% of unigenes with 1,500–1,999 bp, and 95.02% of unigenes with 1,000–1,499 bp were annotated in at least one public database. However, 85.08% of unigenes with 500–999 bp and 71.39% of unigenes with 201–499 bp were annotated (Figure 1).
3.3. Comparison with the Genomes or Transcriptome of Wheat, T. turgidum, Barley, and Rice
Blasted against the draft wheat genome released by IWGSC, 57,331 (75.42%) unigenes were mapped on A, B, and/or D genomes, including 26,122 AB genome-specific and 2,622 D genome-specific unigenes, respectively (SFile 1, in Supplementary Material available online at http://dx.doi.org/10.1155/2016/5781412; Figure 2). Among 26,122 A/B genome-specific unigenes, 7,785 and 11,291 unigenes were mapped specifically on A and B genomes, respectively (Figure 2). Meanwhile, all unigenes were compared with the transcriptome of T. turgidum . 56,343 (74.12%) unigenes were successfully matched with 103,327 (73.74%) unigenes of T. turgidum (SFile 2). Approximately, 25% of unigenes of DPW transcriptome did not match on draft wheat genome or the transcriptome of T. turgidum, which suggested polish wheat has low genetic similarity with T. durum, T. turgidum, and T. aestivum [7, 8] or different tissues for constructing transcriptomes might product some tissue-specific unigenes [10, 11]. Interestingly, 2,622 unigenes were mapped specifically on D genome (Figure 2, SFile 1). Meanwhile, polish wheat may be a hybrid of T. ispahanicum and T. durum [5, 6]. This result indicated that AB genomes might give rise to the D genome through homoploid hybrid speciation .
Meanwhile, all unigenes were also blasted against the published genomes of barley  and rice  with an -value below and more than 100 matched amino acids. 14,404 (18.95%, SFile 3) and 7,007 (9.21%, SFile 4) unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively, which were lower than 70% of unigenes of bread wheat matched with rice and barley genes .
3.4. Tissue-Specific Unigenes
Since this transcriptome was constructed from roots, leaves, and stems, there should be some tissue-specific unigenes. Among 76,014 unigenes, 39,083 unigenes, which were involved in basic development and life cycles, such as translation, secondary metabolites biosynthesis, DNA replication, recombination and repair, transcription, signal transduction, carbohydrate transport and metabolism, cell cycle control, cell division, chromosome partitioning, chromatin structure and dynamics, coenzyme transport and metabolism, defense mechanisms, energy production and conversion, and RNA processing and modification, coexisted in all tissues (Figure 3, SFile 5). 5,160, 3403, and 3183 unigenes coexisted in leaves and stems, roots and stems, and leaves and roots, respectively (Figure 3, SFile 5).
On the other hand, 2,148 unigenes, such as ABC transporter B and C members, high affinity nitrate transporters, peroxidases, and glutathione S-transferases which participated in metal tolerances [27–30], were specifically expressed in roots treated with Cd and Zn (Figure 3, SFile 5). 1611 unigenes, such as some cytochrome P450, ABC transporter B and G members, beta-galactosidases, glucoside dioxygenases, auxin efflux carriers, and glycosyltransferases that participated in phytohormones transport, cell wall metabolism [31–33], respectively, were stem-specific unigenes (Figure 3, SFile 5). 2,707 unigenes, such as G-type lectin S-receptor-like serine and leucine-rich repeat receptor-like protein kinase which were involved in abiotic-stresses tolerance [34–36], were leaf-specific unigenes (Figure 3, SFile 5).
3.5. SSR Mining
Due to high level of polymorphism, locus specificity, codominance, convenience, and uniform distribution throughout the genome , SSR markers have been/are being used in various studies in wheat [8, 37]. In the present study, 5,531 SSRs were observed from 4,531 unigenes with more than 1000 bp. Of them, 810 unigenes contained more than 1 SSR; 241 SSRs were compound formation (SFile 6). These SSRs included 1,485 (26.85%) mono-nucleotide motifs, 1,113 (20.12%) di-nucleotide motifs, 2,744 (49.61%) tri-nucleotide motifs, 163 (2.95) tetra-nucleotide motifs, 19 (0.34%) penta-nucleotide motifs, and 7 (0.13%) hexa-nucleotide motifs (Figure 4(a)). The most abundant repeat type was A/T, followed by CCG/CGG, AG/CT, AGG/CCT, AGC/CTG, AC/GT, AAG/CTT, ACC/GGT, and ACG/CGT, respectively (Figure 4(b)). Based on these 5531 SSRs, 4518 primer pairs were designed using the software of Primer 3 (SFile 7).
The authors declare no conflict of interests.
Yi Wang, Chao Wang, Xiaolu Wang, Xue Xiao, and Yonghong Zhou designed the experiments. Chao Wang, Xiaolu Wang, Fan Peng, Ruijiao Wang, and Yulin Jiang performed the experiments. Yi Wang, Jian Zeng, Xing Fan, Houyang Kang, Lina Sha, and Haiqin Zhang analyzed the data. Yi Wang, Chao Wang, and Xiaolu Wang wrote the paper. Yi Wang and Yonghong Zhou supervised the entire study. Yi Wang, Chao Wang, and Xiaolu Wang contributed equally to this work.
The authors thank the National Natural Science Foundation of China (nos. 31301349, 31470305, and 31270243), Bureau of Science and Technology, and Bureau of Education of Sichuan Province, China.
SFile 1 The location of unigenes in wheat chromosomes. SFile 2 Comparative information against T. turgidum. SFile 3 Comparative information against barley. SFile 4 Comparative information against rice. SFile 4 The information of unigenes expressions. SFile 6 The sequences of SSRs in unigenes. SFile 7 The primers of SSRs.
- M. Wiwart, E. Suchowilska, W. Kandler, M. Sulyok, P. Groenwald, and R. Krska, “Can Polish wheat (Triticum polonicum L.) be an interesting gene source for breeding wheat cultivars with increased resistance to Fusarium head blight?” Genetic Resources and Crop Evolution, vol. 60, no. 8, pp. 2359–2373, 2013.
- M. Rodríguez-Quijano, R. Lucas, and J. M. Carrillo, “Waxy proteins and amylose content in tetraploid wheats Triticumdicoccum Schulb, Triticum durum L. and Triticum polonicum L.,” Euphytica, vol. 134, no. 1, pp. 97–101, 2003.
- H.-Y. Kang, L.-J. Lin, Z.-J. Song et al., “Identification, fine mapping and characterization of Rht-dp, a recessive wheat dwarfing (reduced height) gene derived from Triticumpolonicum,” Genes and Genomics, vol. 34, pp. 509–515, 2012.
- N. Watanabe, “Triticum polonicum IC12196: a possible alternative source of GA3-insensitive semi-dwarfism,” Cereal Research Communications, vol. 32, pp. 429–434, 2004.
- V. R. Chelak, “The origin of Triticumpolonicum,” in Materialy 5-go Mosk. Soveshch. Pofilogeniirast1, pp. 192–193, Nauka, Moscow, Russia, 1976.
- A. D. Gorgidze and K. M. Zhizhilashvili, “Phylogeny of the wheat Triticum polonicum L.,” Bulletin of the Academy of Sciences of the Georgian SSR, vol. 109, pp. 381–383, 1983.
- V. Michalcová, R. Dušinský, M. Sabo et al., “Taxonomical classification and origin of Kamut wheat,” Plant Systematics and Evolution, vol. 300, no. 7, pp. 1749–1757, 2014.
- Y. Wang, C. Wang, H. Zhang, Z. Yue, X. Liu, and W. Ji, “Genetic analysis of wheat (Triticum aestivum L.) and related species with SSR markers,” Genetic Resources and Crop Evolution, vol. 60, no. 3, pp. 1105–1117, 2013.
- J. Duan, C. Xia, G. Zhao, J. Jia, and X. Kong, “Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data,” BMC Genomics, vol. 13, article 392, 2012.
- K. V. Krasileva, V. Buffalo, P. Bailey et al., “Separating homeologs by phasing in the tetraploid wheat transcriptome,” Genome Biology, vol. 14, no. 6, article R66, 2013.
- A. W. Schreiber, M. J. Hayden, K. L. Forrest, S. L. Kong, P. Langridge, and U. Baumann, “Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat,” BMC Genomics, vol. 13, no. 1, article 492, 2012.
- R. Brenchley, M. Spannagl, M. Pfeifer et al., “Analysis of the bread wheat genome using whole-genome shotgun sequencing,” Nature, vol. 491, no. 7426, pp. 705–710, 2012.
- A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, and B. Wold, “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nature Methods, vol. 5, no. 7, pp. 621–628, 2008.
- X.-M. Zhang, L. Zhao, Z. Larson-Rabin, D.-Z. Li, and Z.-H. Guo, “De novo sequencing and characterization of the floral transcriptome of Dendrocalamus latiflorus (Poaceae: Bambusoideae),” PLoS ONE, vol. 7, no. 8, article e42082, 2012.
- J. Zhang, S. Liang, J. Duan et al., “De novo assembly and characterisation of the transcriptome during seed development, and generation of genic-SSR markers in Peanut (Arachis hypogaea L.),” BMC Genomics, vol. 13, article 90, 2012.
- D. Cantu, S. P. Pearce, A. Distelfeld et al., “Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence,” BMC Genomics, vol. 12, article 492, 2011.
- M. Trick, N. M. Adamski, S. G. Mugford, C.-C. Jiang, M. Febrer, and C. Uauy, “Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat,” BMC Plant Biology, vol. 12, article 14, 2012.
- Y. Wang, X. Xiao, X. Wang et al., “RNA-Seq and iTRAQ reveal the dwarfing mechanism of dwarf polish wheat (Triticumpolonicum L.),” International Journal of Biological Sciences, vol. 12, no. 6, pp. 653–666, 2016.
- M. G. Grabherr, B. J. Haas, M. Yassour et al., “Full-length transcriptome assembly from RNA-Seq data without a reference genome,” Nature Biotechnology, vol. 29, no. 7, pp. 644–652, 2011.
- S. Götz, J. M. García-Gómez, J. Terol et al., “High-throughput functional annotation and data mining with the Blast2GO suite,” Nucleic Acids Research, vol. 36, no. 10, pp. 3420–3435, 2008.
- B. Li and C. N. Dewey, “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome,” BMC Bioinformatics, vol. 12, article 323, 2011.
- International Wheat Genome Sequencing Consortium, http://www.wheatgenome.org.
- The International Barley Genome Sequencing Consortium, “A physical, genetic and functional sequence assembly of the barley genome,” Nature, vol. 491, pp. 711–716, 2012.
- S. Ouyang, W. Zhu, J. Hamilton et al., “The TIGR rice genome annotation resource: improvements and new features,” Nucleic Acids Research, vol. 35, no. 1, pp. D883–D887, 2007.
- M. Xin, Y. Wang, Y. Yao et al., “Identification and characterization of wheat long non-protein coding RNAs responsive to powdery mildew infection and heat stress by using microarray analysis and SBS sequencing,” BMC Plant Biology, vol. 11, article 61, 2011.
- T. Marcussen, S. R. Sandve, L. Heier et al., “Ancient hybridizations among the ancestral genomes of bread wheat,” Science, vol. 345, no. 6194, Article ID 1250092, 2014.
- J.-Y. Li, Y.-L. Fu, S. M. Pike et al., “The Arabidopsis nitrate transporter NRT1.8 functions in nitrate removal from the xylem sap and mediates cadmium tolerance,” Plant Cell, vol. 22, no. 5, pp. 1633–1646, 2010.
- S. M. Belchik and L. Xun, “S-glutathionyl-(chloro)hydroquinone reductases: a new class of glutathione transferases functioning as oxidoreductases,” Drug Metabolism Reviews, vol. 43, no. 2, pp. 307–316, 2011.
- C.-Y. Lin, N. N. Trinh, S.-F. Fu et al., “Comparison of early transcriptome responses to copper and cadmium in rice roots,” Plant Molecular Biology, vol. 81, no. 4-5, pp. 507–522, 2013.
- P. Brunetti, L. Zanella, A. De Paolis et al., “Cadmium-inducible expression of the ABC-type transporter AtABCC3 increases phytochelatin-mediated cadmium tolerance in Arabidopsis,” Journal of Experimental Botany, vol. 66, no. 13, pp. 3815–3829, 2015.
- M. Cho and H. T. Cho, “The function of ABCB transporters in auxin transport,” Plant Physiology, vol. 159, pp. 642–654, 2012.
- R. R. Singhania, A. K. Patel, R. K. Sukumaran, C. Larroche, and A. Pandey, “Role and significance of beta-glucosidases in the hydrolysis of cellulose for bioethanol production,” Bioresource Technology, vol. 127, pp. 500–507, 2013.
- Y. X. Xu, Y. Liu, S. T. Chen et al., “The B subfamily of plant ATP binding cassette transporters and their roles in auxin transport,” Biologia Plantarum, vol. 58, no. 3, pp. 401–410, 2014.
- X.-L. Sun, Q.-Y. Yu, L.-L. Tang et al., “GsSRK, a G-type lectin S-receptor-like serine/threonine protein kinase, is a positive regulator of plant tolerance to salt stress,” Journal of Plant Physiology, vol. 170, no. 5, pp. 505–515, 2013.
- J. Zhao, Y. Gao, Z. Zhang, T. Chen, W. Guo, and T. Zhang, “A receptor-like kinase gene (GbRLK) from Gossypium barbadense enhances salinity and drought-stress tolerance in Arabidopsis,” BMC Plant Biology, vol. 13, article 110, 2013.
- L. Yang, K. Wu, P. Gao, X. Liu, G. Li, and Z. Wu, “GsLRPK, a novel cold-activated leucine-rich repeat receptor-like protein kinase from Glycine soja, is a positive regulator to cold stress tolerance,” Plant Science, vol. 215-216, pp. 19–28, 2014.
- H. Luo, X. Wang, G. Zhan et al., “Genome-wide analysis of simple sequence repeats and efficient development of polymorphic SSR markers based on whole genome re-sequencing of multiple isolates of the wheat stripe rust fungus,” PLoS ONE, vol. 10, no. 6, article e0130362, 2015.
Copyright © 2016 Yi Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.