Abstract

The wheat stripe rust fungus, Puccinia striiformis f. sp. tritici (Pst), does not have a known alternate host for sexual reproduction, which makes it impossible to study gene linkages through classic genetic and molecular mapping approaches. In this study, we compared 4,219 Pst expression sequence tags (ESTs) to the genomic sequence of P. graminis f. sp. tritici (Pgt), the wheat stem rust fungus, using BLAST searches. The percentages of homologous genes varied greatly among different Pst libraries with 54.51%, 51.21%, and 13.61% for the urediniospore, germinated urediniospore, and haustorial libraries, respectively, with an average of 33.92%. The 1,432 Pst genes with significant homology with Pgt sequences were grouped into physical groups corresponding to 237 Pgt supercontigs. The physical relationship was demonstrated by 12 pairs (57%), out of 21 selected Pst gene pairs, through PCR screening of a Pst BAC library. The results indicate that the Pgt genome sequence is useful in constructing Pst physical maps.

1. Introduction

Puccinia striiformis f. sp. tritici (Pst) is the causal agent of stripe rust, one of the most important diseases on wheat in many countries of the world [1, 2]. The disease is a major constraint to wheat production and is a serious threat to the global food security. Although the disease is economically important, only limited studies on the genome and functional genomics of the fungal pathogen have been reported [36]. This is an obstacle to our understanding of the pathogen’s evolution, especially changes of virulence that often overcome resistance in wheat cultivars [1, 2, 7, 8].

Pst is an obligate biotrophic fungus that completely depends upon its host plants for continuing growth and reproduction. Techniques for transformation, gene knockout, and transient expression are still to be developed. This excludes the use of molecular techniques, such as restriction enzyme-mediated insertional mutagenesis and gene transformation. Unlike P. graminis f. sp. tritici (Pgt, the wheat stem rust pathogen) and P. triticina (Pt, the wheat leaf rust pathogen), Pst is a microcyclic rust fungus and has only three spore stages, urediniospore, teliospore, and basidiospore, and does not have known pycniospore and aeciospore stages [1, 2]. Because of the lack of the pycnial sexual stage and alternate host for sexual reproduction, it is impossible to study Pst genes through a classic genetic approach and map-based cloning. Thus, gene organization and physical relationships could not be studied for Pst using the molecular mapping approach.

A physical map is useful for studying genome structures, determining gene organization, identifying important genes, and comparing related species for understanding evolutionary relationships. The discovery of conserved chromosomal segments between humans and animals in 1984 [9] led later to the construction of physical maps for human and mouse [1013]. Interestingly, comparative gene mapping reveals that chicken, a nonmammalian vertebrate, has conserved genome sequence synteny with humans [14, 15]. Comparative genomic approaches have also been widely used to study related species in plants [1619] and fungi [2023]. These studies demonstrate that comparative genomic analysis is a powerful approach for studying genomes and genes in organisms that are hard to study using traditional genetic approaches.

Recently, several genetic libraries for Pst have become available, including a BAC library [3], a full-length cDNA library from urediniospores [4], germinated urediniospore or germ-tube EST library [5], and a haustorial EST library [6]. A total of more than 15,000 ESTs were sequenced, from which 4,219 unisequences were characterized and their putative functions were identified through sequence comparison with other fungal genes in GenBank databases. However, the physical and genetic relationships of these genes have not been determined. Since Pst genome sequencing has just been started, here we have used the available Pgt genome sequence (http://www.broadinstitute.org/annotation/genome/puccinia_group/MultiHome.html) for constructing physical maps for Pst genes. The study was based on the assumption that Pst and Pgt share considerable sequence homology and genome synteny. The specific objectives of this study were to determine the homology of Pst EST unisequences to Pgt genomic sequences, construct physical groups for the Pst genes using the Pgt sequences as the references, and verify the physical relationships of selected Pst genes using PCR screening of the Pst BAC library. Although much of the physical relationship needs to be verified by whole-genome sequence, the physical maps generated in this study should provide a basic framework for assisting Pst sequence assembling and gene annotation with Pgt sequences and also should be useful for localizing functional genes, positional cloning of full-length genes, and generating information about exons and introns for Pst genes.

2. Materials and Methods

2.1. Data

Genome-based EST mapping requires the genome map and transcript sequences. The three Pst cDNA libraries were generated from three different growing stages, urediniospores (Ured), germinated urediniospores (GermUred)/germ tubes, and haustoria (Haus). The Ured and Haus cDNA libraries were constructed from mRNA of PST-78, a typical US race [4, 6], and the GermUred library was from mRNA of CYR32, a typical Chinese race [5]. A total of 4,219 unisequences, which were obtained from more than 15,000 clones sequenced from the three libraries after removing sequences of poor quality (<100 bp inserts) and repetitions and forming contigs (4, 5, 6, Chen and associates, unpublished), were used in this study for comparing with the Pgt genomic sequence. The Pgt genome sequence was downloaded from the NCBI Genome Project Puccinia graminis Database (http://www.broad.mit.edu/annotation/genome/puccinia_graminis), consisting of 392 genome supercontigs and 4,775 contigs.

2.2. Mapping Pst EST Sequences against the Pgt Genome Sequence

All Pst ESTs were mapped against the Pgt genome using the BLASTN program [24]. We used the high-speed service computer system of the Washington State University Bioinformatics Center for BLAST and homology searches. The Pgt genome and Pst EST sequences were transferred to a server computer using the SSH (Secure Shell) software as fasta format files. Sequences of low homologous alignment were filtered out using the e value of 1.00E-5 as a cut point. The alignable ESTs were assembled according to the 4,775 contigs in the 392 supercontigs of the Pgt genome sequence. Detailed alignment information was edited in an Excel file. To see the positions of the Pst ESTs corresponding to the Pgt genome, physical maps were constructed. Physical maps corresponding to Pgt supercontigs illustrated the physical position order of the genes, length of each EST, and the distances between genes. The genes localized in a single contig were marked using a sign of “” and the alignment start and end positions of the Pgt genome were given in parentheses.

Because the ESTs were transcribed from the genome and the introns were spliced after alternative splicing, the ESTs represent the exon sequences. Therefore, it was important that we were able to get the information about the alternative gene splicing and the intron number from the maps. If a Pst EST sequence was aligned to a location in the Pgt genome as a series of fragments, these genes were likely to show alternative splicing, and the number of exons was marked after the parentheses on the map. All sketch maps of Pst genes are shown in file 1 in Supplementary Material available online at doi: 10.1155/2009/302620.

2.3. Verification of Physical Relationships of Selected Pst Genes

Although Pgt is most closely related to Pst among the fungi whose whole genome has been sequenced so far, their gene sequences and locations could be different for some genes. To validate the veracity of the alignment, we selected 42 genes as 21 pairs. The sequences of the 42 genes were used to design primers. The 42 primer pairs (Table 1) were used to amplify BAC clones. If a single BAC clone was amplified by primers of both genes in a pair, the two genes were concluded to be physically colocated. Because the BAC library has an average insert size of 50 Kb [4], the two genes in each pair were selected based on their distance in between smaller than 50 Kb. For each pair of genes, the primers for one of the genes were used to amplify the entire BAC library of 43,000 clones [3] using a three-dimensional approach as described by Ling and Chen [25]. To be more efficient, the primers for the second gene in the pair were used to amplify only the positive BAC clones from the screening. To speed up the PCR screening, two pairs of primers for two genes with similar annealing temperatures were used in a multiplex PCR amplification.

Multiplex PCR was performed in a GeneAmp PCR System 9700 thermo-cycler. A 20  L reaction mixture contained 1.0  L (30 ng/ L) of a BAC clone DNA, 4.0  Mg-free 5X PCR buffer (Promega, Madison, WI, USA), 0.1  of 5 unit Taq DNA polymerase (Promega), 2  of 25 mM 0.5  of 2.5 mM dNTP (dATP, dCTP, dGTP and dTTP) (Sigma Chemical Co., St. Louis, MO, USA), and 1.0  of 10 mM each primer synthesized by Operon Biotechnologies, Inc. (Huntsville, AL, USA). After 2 minutes of denaturation at amplifications were programmed for 35 cycles, each consisting of 30 seconds at 30 seconds at 45.9– depending upon primer pairs shown in Table 1, 40 seconds at , and followed by a 10-minute extension step at After PCR amplification, 5  L of the solution for each sample was electrophoresed in a 1.5% agarose gel in 05x TBE buffer (0.089 M Tris-borate, 0.089 M boric acid and 0.002 M EDTA). The 100 bp plus DNA ladder (Fermatas, Glen Burnie, MD, USA) was used to estimate the size of each amplified DNA fragment. The gel was run for 90 minutes at 100 volts, stained with ethidium bromide (0.5  g/mL) for 30 minutes, and photographed under ultraviolet light. The genomic DNA of Pst race PST-78 was used as positive control and autoclaved dd was used as a negative control in the PCR amplification.

3. Results

3.1. Homology of Pst ESTs and Pgt Genomic Sequences

Of the 4,219 Pst unisequences from the Ured, GermUred and Haus libraries were searched for homologous sequences in the Pgt genome, 1,432 had significant homology (e value < 1.00E-5) to Pgt genomic sequences. As shown in Table 2, the three libraries had different percentages of homologous genes with Pgt. The Ured library had the highest percentage, 54.51%, followed by the GermUred library (51.21%), while the Haus library had the lowest percentage (13.64%). In average, 33.94% of the 1,432 Pst genes had significant homology with the Pgt sequences.

3.2. Physical Groups

The 1,432 Pst genes were aligned to 237 physical groups corresponding to 237 Pgt supercontigs (Supplementary file 1). As an example, Figure 1 shows Pst genes aligned to Pgt supercontig 1. The number of genes for each supercontig from each Pst cDNA library is shown in Table 3. The 237 physical groups ranged from 2,878 to 3,081,398 bp with most of the groups ranging from 5.0 Kb to 2.0 Mb (Figure 2(a)). Overall, the 1,432 genes matched 787,413 bp and spanned over 86.55 Mb of the Pgt genomic sequences. Because the majority of the 1,432 unigenes were aligned to more than one sequence locus, a total of 4,604 gene loci were obtained (Table 3). The fold of multiple loci per unique gene was unbalanced among the three libraries with 1.30 for the GermUred library, 1.53 for the Ured library, and 10.58 for the Haus library.

The number of genes varied from 1 to 153, excluding “Supercontig 392”, which contained unassembled sequences, with an average of 19 genes per supercontig (Table 3, Figure 2(b)). Over 70% of supercontigs contained 20 or fewer genes that showed homology to Pst EST sequences while only 4 supercontigs (Supercontigs 1, 2, 3, and 17) had more than 100 genes. The genes from the three Pst libraries were unevenly aligned to the Pgt genome. A total of 712 unisequences were aligned to 134 supercontigs with an average of 5.3 genes per supercontig; 441 unisequences were aligned to 121 supercontigs with an average of 3.6 genes per supercontig; the 279 supercontigs were aligned to 213 supercontigs with an average of 1.3 genes per supercontig. The gene density (the number of base pairs per gene) ranged from 1,020 to 209,493 bp with an average of 18,799 bp (Table 3, Figure 2(c)). The majority of the supercontigs had a gene in a genomic region smaller than 30 Kb, which may be considered to be a relatively gene-rich region. In contrast, a few supercontigs had a gene in genome region larger than 60-Kb, which may be considered as relatively gene-poor region. These results indicated that genes expressed in different Pst growth stages tended to be clustered in different regions of the genome.

3.3. Exons of Pst Genes Revealed by Comparison with Pgt Genomic Sequences

Of the 1,432 Pst genes, 911 (63.62%) had more than one exon and the remaining 521 (36.38%) had one exon. Of the 911 genes with multiple exons, 570 (62.57%) had two, 200 (21.95%) had three, 97 (10.65%) had four, 25 (2.74%) had five, 13 (1.43%) had six, 3 (0.33%) had seven, 2 (0.22%) had eight, and 1 (0.11%) had nine exons. The different numbers of exons indicate the different levels of complexity of the genes, which might reflect their variability resulting from the evolutionary process.

3.4. Validation of Physical Relationships of Selected Pst Genes

To validate the physical relationships of Pst genes, a total of 84 forward and reverse primers were designed for 42 genes to form 21 pairs (Table 1). The genes in each pair were selected based on their proximity within 50 Kb in the physic map. Clones that were positively amplified with the first pair of primers resulted from the three-dimensional pooling screening were amplified with the second pair of primers, as illustrated in Figure 3. Of the 21 pairs of genes tested, 12 pairs (57%) were successfully identified in same BAC clones. The results clearly showed that these genes in pairs were truly colocated in the Pst genome.

4. Discussion

Before the Pst genome is completely sequenced, which is under way, it is almost impossible to study genetic and physical relationships among genes of this obligate biotrophic fungus without sexual reproduction [2]. In this study, we explored the possibility to use the whole genome sequence of Pgt, the most closely related fungus sequenced so far, as a reference to construct physical maps for Pst genes. From a total of 4,219 unique genes, we identified 1,432 genes significantly homologous to sequences in the Pgt genome. Because of their high nucleotide identities to the Pgt genome sequences, we assumed that these genes should have high levels of synteny to the corresponding genes in the Pgt genome. Thus, using the Pgt genomic sequences, we grouped the 1,432 Pst unique genes with a total of 4,604 genomic loci into 237 physical groups corresponding to Pgt supercontigs. The proximity physical relationship was demonstrated for 12 pairs of genes using our Pst BAC library [3]. This study is the first to report the physical relationships for Pst genes and is the first to use the whole-genome sequence of a fungal species to study physical relationships of genes in a related species among the cereal rust fungi.

The homologous genes did not show an even distribution on the Pgt genome because no homologous Pst genes were found on 145 of a total of 382 Pgt contigs and gene densities varied greatly from 1,020 bp to 209,493 bp. Such an uneven distribution may be partially due to the different sizes of the Pgt supercontigs. The uneven distribution also could be caused by the relatively small number of genes. The 1,432 Pst genes are only about 8% of the total estimated number of genes based on the over 20,000 genes of Pgt. It also is possible that the genes expressed in each of the three developmental stages may cluster on certain genome regions. Nevertheless, the data may indicate the existence of gene-rich and gene-poor regions in the Pst and Pgt genomes. The information of gene-rich regions and Pst/Pgt homologous gene-rich regions will be useful in understanding the evolutionary relationships of the two related but different rust fungi. This hypothesis would be more clearly tested by comparing all Pst genes after the completion of the whole-genome sequencing and sequencing of more ESTs, which are currently being undertaken.

In this study, we only tested 42 Pst genes in 21 pairs in the PCR screening of the BAC library. In contrast to the 12 pairs that were demonstrated in the same BAC clones, positive results were not obtained for 9 of the 21 pairs. However, the unsuccessful amplification by the second genes in the 9 pairs does not exclude the possibility of physical relationships for the genes in each of these pairs. As the inserts of the BAC clones were relatively short, 50 Kb in average [3], the clones might be too small to harbor both genes in a pair. It is also possible that the Pst genes in each pair may have a longer distance than the reference distance in the Pgt genome, but they may still be linked to each other.

The Pst genes used in this study were from three libraries. The genes from the Ured library gave the highest percentage of genes homologous to Pgt and the genes from the Haus library gave the lowest percentage of homologous genes. The GermUred clones had similar percentage of Pgt-homologous genes to the Ured library, although the two libraries were made from different isolates while the Haus library was made with the same isolate as the Ured library [46]. The low proportion of the Pst genes from the Haus library similar to the Pgt sequences was surprising as we thought that two fungal species in the same genus should have higher homology than human and mouse that are in very different taxa [9]. Although this phenomenon needs more studies, we have learned from other rust fungi that genes expressed in haustoria tend to be more species specific [26, 27]. Comparisons of Pst genes expressed in different growth stages with the Pgt sequences tell us that genes expressed in urediniospore are more conserved among different Puccinia species while those expressed in haustoria are more unique. Such genetic differences may be related to their different requirements in temperature for infection of the same wheat host crop.

It is interesting that the smallest number of unique genes (279) from the Haus library produced the highest number (2,952) of genomic loci along the Pgt genome among the three libraries. The high fold (10.58x) of gene copies may compensate for the low number of homologous genes from haustoria, which may make the overall homology of Pst and Pgt genome sequences reasonably high. The genomic loci were aligned to more supercontigs than the genes from the Ured and GermUred libraries. These results indicate that haustorially expressed genes tend to have multiple copies and spread along the Puccinia genome. This phenomenon needs to be further studied using the whole genome sequence of Pst.

Although much of the physical relationship is still hypothetical and needs to be verified by the whole genome sequence of Pst, the physical groups constructed in this study can serve as references and starting points in assisting sequence assembling and gene annotation. A more detailed dissection of gene sequences, organization, structures, and clusters may allow us to pick genome regions and gene clusters to study their functions and developing molecular markers to tag virulence groups and characterize Pst populations.

In this study, we found that some ESTs could be matched to more than one location. Also, an alignment consisted of multiple exons while others do not have introns. We included the intronless sequences in the physical maps. Intronless sequences as pseudogenes have coincident nucleotide sequences with coding protein genes ubiquitously existing in the eukaryotes genome [28, 29]. Although pseudogenes may be functionless DNA fragments in the genome, they have evolved from mRNA reverse transcription and then reset in the genome. So, pseudogenes do not have introns and promoters but have poly(A) sequences. For a full-scale gene mapping, it represents the real gene transcription and sequence existence. Most of our EST sequences are not full-length and only have partial information of genes. This might be an explanation why a considerable number of ESTs were aligned to regions of the Pgt genome without introns.

We found that many of the Pst ESTs that matched to Pgt genomic sequences were shorter than 100 bp. These short sequences may be exons, whose lengths can vary greatly. Most vertebrate exons are between 50 and 400 bp long [30]. Using the complementary sequence feature method in humans, Arabidopsis, Cryptococcus, and Plasmodium, Saeys et al. [31] reported that one-third of all exons were smaller than 100 bp. Gudlaugsdottir et al. [32] reported significant variation in exon length for human and fission yeast ranging from 1 to thousands of base pairs. Because exon sizes can vary from a few base pairs to thousands of base pairs, we reserved even the segments smaller than 50 base pairs, which may have saved some unknown information in alignment and make the information available for the future Pst genome research. The number of exons in a gene may indicate its stability or variability, which may allow us to choose genes for studying various aspects of pathogen biology. Genes with only one exon may be chosen to study the genetic relationships at a higher taxonomic level, such as species and formae speciales, and those with multiple exons may be used to study genetic differences among isolates within a forma specialis. Genes with multiple exons may be better candidates for studying traits like virulence and adaptation to different environments as these traits have more variations.

In this study, we produced preliminary physical maps for Pst genes. The 4,604 genomic loci of 1,432 genes were placed on the physical map account about 8% of potential genes, if we assume that Pst and Pgt have a similar number of genes. Because we used only unique genes, some genes belonging to large families could be located on multiple genome sites. In the future, this physical map will be verified and ultimately be improved by the complete set of the Pst genes and connected with nontranscribed sequences. The physical groups should provide insights into gene organization, identification of functionally related genes, positional cloning of full-length genes, information on exons and introns, and assist in sequence assembly and gene annotation for the Pst whole-genome sequencing.

Acknowledgments

This research was supported by the US Department of Agriculture, Agricultural Research Service (Project no. 5348-22000-014-00D), Washington Wheat Commission (Project no. 13C-3061-3923), and Vogel Foundation (Project no. 13Z-3061-3824). The authors would like to thank the support of PPNS no. 0534, Department of Plant Pathology, College of Agricultural, Human, and Natural Resource Sciences, Agricultural Research Center, Project no. WNP00823, Washington State University, Pullman, WA 99164-6430, USA. The authors thank Drs. Chuntao Yin and Scot Hulbert for providing the haustoria sequences and Drs. Axel Elling and Lee Hadwiger for critical review of the manuscript. The scholarship from China Scholarship Council to Jinbiao Ma is appreciated. The research is also part of the Northwest A&F University Plant Pathology “111” Project.

Supplementary Materials

Physical maps for Pst ESTs based on corresponding sequence positions of homologous genes of Pgt. A total of 242 physical groups are constructed. The distance in mega base (Mb) is shown on the left. The clones in a group indicated by a vertical line are in the same contig and the start and end positions of the sequence matching the positions in the contig are shown in the “( )” following the clone identification number. The number after the “( )” indicates the number of the gene with multiple positions in the Pgt genome. An asterisk indicate that the number of matching base pairs is smaller than 100. The clones underlined were used in PCR amplification of the Pst BAC library.

  1. Supplementary Material