Abstract

Soybean rust is caused by the obligate biotrophic fungus Phakopsora pachyrhizi, an exotic pathogen causing important yield losses in soybean production. We used an mRNA-Seq strategy to analyze the expression pattern of soybean genes and better understand molecular events occurring in soybean following the infection. cDNA libraries were constructed from RNA isolated from whole infected soybean leaves 10 days after inoculation with P. pachyrhizi and sequenced using an Illumina platform to identify soybean genes that are affected by pathogen growth. We obtained 15 million sequences corresponding to soybean genes. Forty-two percent of the genes were downregulated including genes encoding proteins involved in amino acid metabolism, carbohydrate metabolism, and transport facilitation; 31% were upregulated including genes encoding proteins involved in lipid metabolism, glycan biosynthesis, and signal transduction. Candidate host genes identified in this study will be manipulated to assay their potential to control soybean rust disease.

1. Introduction

The interaction between a pathogen and its host is complex and not completely understood, especially at the molecular level. Many aspects need to be examined beginning with how the pathogen recognizes its host and determining the differences between a susceptible and a resistant interaction. These aspects and many more can be better understood by studying gene expression of healthy versus pathogen-challenged plants. Since the discovery of Sanger sequencing [1], expressed sequence tags (ESTs) from a biological sample have been used to show how cells react to biotic and abiotic stresses [25]. However, this technology has many limitations.

Since 2004, many companies have been working in the area of next-generation sequencing to provide high throughput and deep-sequencing technology at low cost. The Illumina system has been used for multiple applications depending upon the method of sample preparation. It allows (i) DNA sequencing including sequencing of whole genome and SNP discovery, (ii) transcriptome analysis including annotation of coding SNPs and discovery of transcript isoforms, and (iii) gene regulation analysis including quantification of in vivo protein-DNA interactions and characterization of the transcriptome to single-base resolution. RNA analysis through the mRNA-Seq assay to provide transcriptome analysis has been available since late 2008. Since then, there have been numerous publications describing the utilization of the mRNA-Seq application on the Illumina platform to perform transcriptome analyses. Marioni et al. [6] compared data from mRNA-Seq using the Illumina platform to the Affymetrix chip U133 Plus 2 with RNA extracted from human liver and kidney. They concluded that the high-throughput sequencing technology is highly efficient at detecting differential expression. They identified about 3,380 more genes as significantly differentially expressed using the Illumina platform. Mortazavi et al. [7] tested this technology with a mammalian system using mRNA extracted from various mouse tissues. They found about 17,000 features that were candidates for being previously undiscovered parts of existing genes and 596 new candidate transcripts. Most of the published reports that have employed the Illumina platform have examined mammals and yeast. There are only a few reports describing the use of the Illumina platform with plant material, and most have been performed on tissue from Arabidopsis thaliana in absence of pathogen infection. Lister et al. [8] revealed altered transcript abundance of hundreds of genes, transposons, and unannotated intergenic transcripts in the A. thaliana genome. Denoeud et al. [9] were one of the first to use the Illumina platform to build de novo gene models. From approximately 175 million Illumina RNA-Seq reads from four tissues of grapevine, they identified new exons in known loci, alternative splice forms, as well as some new loci.

Our study compared transcript levels in leaves of soybean plants inoculated with soybean rust (SR) to those in healthy plants using the Illumina platform. Our goal was to identify genes involved in a susceptible reaction at the uredinium-formation stage to better understand the interaction between soybean and SR at this underevaluated late stage (10 days after inoculation) of infection. Most analyses of soybean (Glycine max) and soybean’s relative (G. tomentella) gene expression were done at early steps of infection, six to 72 hours after inoculation (hai) [1013], while van de Mortel et al. [14] examined time points up to 96, 120, and 168 hai. All these studies were done using soybean microarrays, and they focused on expression profiles of genes directly involved in the defense response, such as genes encoding WRKY transcription factors, peroxidases, heat shock proteins, and lipoxygenases, as well as genes encoding enzymes involved in production of defense-related secondary metabolites. However, there is little known about what changes occur in the plant metabolism that may help the fungus to proliferate. Since 2008, public access has been available to the first soybean chromosome-scale genome assembly (http://www.phytozome.net/). Approximately, 975 Mb of the soybean genome is annotated and encompasses 20 chromosomes and a small additional amount of mostly repetitive sequence resides in unmapped scaffolds. There are 66,153 protein-coding loci predicted including duplicated genes, gene family members, and unique genes. The accessibility of the genome sequence allows easier identification of Illumina reads. The main objective of this study was to analyze the soybean and rust transcriptomes during soybean-rust interaction and develop an efficient bioinformatics pipeline to analyze such a large dataset. The data were analyzed to determine the soybean gene expression profile at 10 days after inoculation (dai) by comparing it to a control sample. This paper presents a complete soybean transcriptome analysis 10 dai with P. pachyrhizi and provides clues to the interactions allowing fungal proliferation and provides possible targets to arrest proliferation and further sporulation.

2. Materials and Methods

2.1. Pathogen Isolation and Plant Inoculation

A Phakopsora pachyrhizi isolate (MS06-1) was obtained from urediniospores harvested from field-collected kudzu leaves in Jefferson County, Mississippi, in August 2006; its identity was confirmed by microscopy, enzyme-linked immunosorbent assay (ELISA), and polymerase chain reaction (PCR) as previously described [15]. Urediniospores were increased on a susceptible soybean cultivar, Williams 82 [16], in the Stoneville Research Quarantine Facility in Mississippi. The isolate was then purified by picking up a single uredinium using a fine needle under an Olympus SZX12 dissecting microscope and reinoculating it on leaves of Williams 82. This inoculation-isolation cycle was repeated four times. Urediniospores from this purified culture were harvested using a Cyclone Surface Sampler (Burkard Manufacturing Co. Ltd, UK) connected to a vacuum pump, beginning from 10 to 14 (dai) and continuing at weekly intervals.

Inoculum was prepared using freshly collected urediniospores from Williams 82. Spore suspensions were made using sterile distilled water containing 0.01% Tween-20 (vol/vol), mixed, and filtered through a 100-μm cell strainer (BD Biosciences, Bedford, MA) to remove any debris and clumps of urediniospores. Urediniospores were quantified using a hemocytometer and diluted to a final concentration of spores/mL. Three plants per 10 cm-pot were prepared in three replicates (pots). Primary leaves of 3-weeks-old Williams 82 seedlings were inoculated using a Preval sprayer (Younkers, NY) at a rate of one milliliter of spore suspension per plant. The same solution minus spores was used for a mock inoculation on three pots of plants as a control. After inoculation, plants were placed in a dew chamber in the dark at 22°C overnight (approximately 16 h) and then moved to Conviron growth chambers where temperatures were maintained at 23°C during the day and 20°C at night under a 16 h photoperiod with a light intensity of 280 μEm−2 s−1. SR-inoculated and mock-inoculated plants were kept in two different growth chambers. Experiments were repeated once.

2.2. RNA Extraction and Isolation

The first trifoliate leaves from nine plants of Glycine max cultivar Williams 82 were collected at 10 dai with Phakopsora pachyrhizi (SR). Leaves from an additional set of plants were collected 15 seconds after the inoculation, as a time-zero control where a minimal interaction between the plant and the fungus was expected. Leaves were immediately immersed in Farmer’s solution [17] and stored at 4°C. One hundred milligrams of leaf tissue were ground in liquid nitrogen, and RNA was extracted using 450 μL of buffer RLC (Qiagen). RNA was isolated from the sample using an RNeasy Plant Mini Kit (Qiagen) according to the manufacturer’s instructions. RNA was treated on column with DNA-free DNAse from Ambion using 80 μL of DNAse and an incubation time of 15 minutes at room temperature. Total RNA from three different plants was pooled together. Five hundred nanograms were used to evaluate the RNA quality and integrity on an agarose gel.

2.3. cDNA Preparation

From ten micrograms of total RNA, mRNA was purified using oligo(dT) Dynal magnetic beads (Invitrogen). Two rounds of purification were performed with the beads. The resulting mRNA was fragmented using RNA fragmentation reagent from Ambion. An incubation of five minutes at 70°C was performed before stopping the reaction. The fragmented mRNA was precipitated for 30 minutes at −80°C with 1/10th volume of 3 M NaOAc, 40 μg of glycogen and 3 volumes of 100% EtOH. The fragmented mRNA was used as template for cDNA synthesis. One hundred nanograms of random hexamer were used in concert with the first-strand synthesis reagents from the Superscript III first-strand synthesis system (Invitrogen) following the manufacturer’s instructions. Second strand synthesis was performed using 1X second strand buffer (Invitrogen), 0.3 mM of dNTP mix, 2 units of RNaseH, and 50 units of DNA polymerase I at 16°C for 2.5 hours. cDNA was then purified on a Qiaquick PCR purification column (Qiagen) and eluted in 30 μL of EB buffer (Qiagen). Then, Genomic DNA Sample Preparation Kit from Illumina (http://www.illumina.com/) was used to end repair, add a single A base, and ligate the adaptor to the cDNA molecules. A 200 bp band was gel purified and PCR enriched using Illumina primers.

2.4. Sequencing

Three different concentrations of a cDNA sample at 10 dai were hybridized to the flow cell surface in a cluster generation station (Illumina). One hundred and twenty microliters of 1 pM cDNA solution were applied in lane no. 1 and no. 2, 120 uL of 4 pM cDNA solution in lane no. 3 and no. 4, and, finally, 120 uL of 2 pM cDNA solution in lane no. 6, no. 7, and no. 8. Lane no. 5 was used to hybridize the PhiX control from Illumina. On a second flow cell, 8 pM of cDNA from time-zero control plants was hybridized to one lane. The libraries were sequenced as 36-mers for 10 dai and 72-mers for time-zero on the Genome Analyzer II (GAII) and GAIIx (Illumina).

2.5. Mapping Abundance and Statistical Analysis

The GAIIx Genome Analyzer produced 135,483 tiff (Tagged Image File Format) images, which were separated into 43 cycles and 100 tiles. The images were analyzed using the Illumina Genome Analyzer pipeline v1.0 for 10 dai sample and v1.4 for time-zero sample. The pipeline executed image analysis (GOAT), base-calling (BUSTARD), and aligned the reads (CASAVA) onto the Phytozome reference soybean genome (http://www.phytozome.net/soybean). Default parameters were used throughout the pipeline operation. Subsequently, CASAVA (Consensus Assessment of Sequence and Variation) output was passed through an additional filtering step where reads with two or more ambiguous characters were removed and 72 nucleotide reads were trimmed by 15 bp at the 3′-end to ensure lack of interference from sequencing errors before mapping individual reads against the twenty Glycine max chromosomes. In aligning of RNA-Seq reads to the soybean reference genome, the soybean (Glycine max) v5.0 genome build was utilized (http://www.phytozome.net/soybean).

Both experimental and control samples were single reads; therefore, all raw reads from time-zero and 10 dai were bootstrapped using Matlab (MathWorks, Natick, MA) to randomly create a new replicate for both time points. Bootstrapping was iteratively performed for both the 0 dai and 10 dai single-replicate dataset, which have 4,467,871 reads and 3,510,311 reads, respectively (7,978,182 reads total). Bootstrap sampling with-replacement generated two new datasets, D’, which are equal in size to the original replicates.

TASE [18] was used to map chromosomal coordinates from D’ to each respective chromosome resulting in twenty bins, one bin per chromosome, containing quantified expression for each functionally annotated region.

Quantified expressions for each of D and D’ were then analyzed using the two-sample Kolmogorov-Smirnov (KS) [19] test to compare distributions between both samples and calculate the significant difference between two datasets. We propose a null hypothesis such that the bootstrapped dataset, D’ shall exhibit the same distribution to that of the original dataset, D. All twenty bins passed the KS-two sample test therefore validating our null hypothesis that D and D’ are from the same distribution. The above test was performed for both time points, using a significant value of 0.05.

In addition, a two-sample t-Test using D and D’ from both time points was performed with a significance level of 0.05. For time-zero and 10 dai, both t-scores (1.98 and 3.88, resp.) exceeded the critical value of 1.96 meaning that replicates for each of the datasets were not statistically different. All lanes sequenced were analyzed using TASE [18], a tag-counting software tool for Illumina CASAVA single-end mRNA-Seq datasets. TASE quantitatively mapped mRNA-Seq reads to their respective homology-based annotation, thereby producing a tag-count per functional annotation. Oftentimes, however, a transcript maps to multiple locations or produces overlapping mappings. When this occurs, the alignment with the greater score is selected, and only that features’ tag-count is incremented.

The Glycine max v5.0 transcriptome build (ftp://ftp.jgi-psf.org/pub/JGI_data/phytozome/v5.0/Gmax/) and KOG (clusters of euKaryotic Orthologous Groups) homology-based annotations were used to facilitate TASE analysis. The tab-delimited text files produced contained the number of mRNA-Seq mapping per transcript, thereby providing transcript expression and its corresponding homology-based annotation. The resulting number of reads was normalized by dividing the number of reads for a given gene in an experiment by the total number of annotated reads in the same experiment multiplied by a constant (100,000). For both datasets, a normalized read count threshold of 3 reads was established. Doing so eliminated loci with very low tag-counts, because these can be a result of sequencing biases or noise.

Log2 fold-changes and read-count ratios were computed for both datasets for the reads passing the cutoff threshold. Transcript fold-changes were visualized by overlaying the data on KEGG metabolic pathways utilizing PAICE. PAICE is a software tool that color-codes KEGG pathways using data for gene transcript levels and Enzyme Commission (EC) accessions (http://paice.sourceforge.net). PAICE was employed for its ability to handle duplicate gene copies, to color-code genes with large variances between time points, as well as to color-code accessions according to fold-change values.

2.6. Confirmation of Differential Gene Expression Using Quantitative PCR

Three replicates were used for each time point for each quantitative qPCR. One replicate was from the same preparation used to create a cDNA library for sequencing, and two more biological replicates were used. One microgram of RNA was used to generate first-strand cDNA using the SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen) following the manufacturer’s instruction. Five nanograms of cDNA were used in a 25 μL reaction containing 1X of Brilliant SYBR Green qPCR master mix (Stratagene) and 0.15 μM of primers (see Table S1 in Supplementary Material available online at doi: 10.1155/2011/827250). The cycling conditions consisted of the following steps: an initial 15 min denaturing step at 95°C; 50 cycles at 95°C for 10 sec; 65°C for 2 min followed by a dissociation curve. The expression patterns of ten soybean genes expressed in our dataset at both time-zero and 10 dai were analyzed. A soybean housekeeping gene, ubiquitine-3 (Accession D28123) [14], was used to normalize the results. Other controls for qPCR included reactions containing either no template or DNA processed with no reverse transcriptase.

Three technical replicates were performed by qPCR on each sample using all primer sets. The relative level of gene expression for each gene was determined using the Stratagene Mx3000P Real-Time PCR system (Stratagene, La Jolla, CA) as described by the manufacturer. DNA accumulation during the reaction was measured with SYBR Green. The Ct (cycle at which there is the first clearly detectable increase in fluorescence) values were calculated using software supplied with the Stratagene Mx3000P Real-Time PCR system. The SYBR green dissociation curve of the amplified products demonstrated the production of only one product per reaction. Data analysis was performed according to the sigmoidal model [20] to get absolute quantification as described in Tremblay et al. [21].

3. Results and Discussion

3.1. Identification of Soybean and SR Genes

The response of a plant to attack by a pathogen is complex. The new generation platforms for DNA sequencing allow deep transcriptomic analysis of the plant’s response to pathogen attack as indicated by this study. Table 1 depicts the number of reads generated; the raw reads are accessible through NCBI (accession number SRX100853 and SRX100854) in the Sequence Read Archive (SRA). Illumina reads were aligned to the soybean genome using Phytozome version 5.0. Three categories of reads were identified. First were reads aligning to annotated regions of the soybean genome. Second were reads aligning to regions of the soybean genome lacking annotation, indicating the presence of a transcribed gene previously unreported. Approximately 61 percent (around 15 million) of the total number of reads aligned to the soybean genome and fell into the first two categories. In spite of our efforts to identify the function of all proteins encoded, there are still regions of the genome that encode open reading frames without any functional annotation. However, of all reads aligning to the soybean genome, an average of 54 percent mapped to positionally annotated genes which represent over 18,000 protein-coding loci.

The third category of reads contains those that did not align to the soybean genome. About 39 percent (around 9 million) reads fall into this category. These may represent soybean genes encoded in gaps in the present version of the soybean genome map or they may be SR genes. Contigs (163,297) formed from these reads could be positively identified as being SR genes by matching to known ESTs of SR (76%) publicly available from NCBI; however, the sequence and assembly of the SR genome are not available. Few of these 163,297 contigs had high identity with genes encoding proteins from fungi (7%). There still 23% of these contigs did not have identity with genes encoding known protein from various organisms nor with the limited number of known SR genes. This demonstrates the lack of genomic data available for filamentous fungi, specifically SR. Many of these unidentified contigs had high expression levels and may represent newly discovered SR genes. The number of DNA sequences obtained using the Illumina platform in our study is comparable to those reported elsewhere [9, 22, 23].

3.2. Gene Expression Level

mRNA-Seq results also provide a measure of expression for each of the genes. By calculating the number of reads aligning to a positionally annotated protein-coding locus, the level of gene expression was estimated. This calculation is referred to as tag count. The tag count for genes in our dataset varied over a wide dynamic range from one to more than 17,000. Scatter plots of the tag counts for genes expressed at 10 dai and time-zero were generated (Figure S1; supplementary data). These showed that some genes have similar expression profiles across the two samples, yet others have a very large tag-count in both 10 dai and 0 dai. Genes with a tag count of three were considered as low abundance transcripts. Those genes represented about 13% of our 10 dai dataset and 16% of our time-zero dataset. These low abundance transcripts were mostly involved in metabolism, specifically phosphate and carbohydrate metabolism, and transcription (Figure 1).

3.3. Differential Gene Expression

To characterize the differential expression of genes at 10 dai, we analyzed the tag count for a specific gene at 10 dai versus the tag count at time-zero which allowed us to calculate the change in expression. Forty-two percent of our positionally annotated genes were downregulated at 10 dai, while 31% were upregulated (Table S3). Twenty-seven percent maintained comparable expression levels at 10 dai compared to time-zero. Figure 2 shows the functional category of the fifty most down- and upregulated genes, and Table S2 depicts their identification.

Included in these fifty most upregulated genes are genes encoding eight proteins that were also identified previously in our microarray experiment [24]. These genes included genes encoding copper chaperone, cytochrome P450, O-methyltransferases, and reductases with broad range of substrate specificities, transporter from the ABC superfamily, dienelactone hydrolase family protein, and proteins related to the EF-hand superfamily. Genes encoding chitinase class IV were also identified as highly upregulated by microarray and also by deep sequencing. Class IV chitinases break down glycosidic bonds of fungal cell walls and are important in fighting fungal attack. On the other hand, some of these fifty most upregulated genes identified in our deep sequencing experiment were downregulated in our previous microarray. These included genes encoding UDP-glucuronyl and UDP-glucosyl transferase, ADP/ATP carrier protein, carbonic anhydrase, and serine/threonine protein kinase.

As part of our fifty most downregulated genes were nine genes, in that were also downregulated genes in our microarray experiment. These genes included genes encoding aquaporins, beta-galactosidase, leucine-rich repeat family proteins, beta and alpha tubulin, thiamine biosynthesis family protein, asparaginase, lactate dehydrogenase, and chloroplast 30S ribosomal protein S17.

Gene expression levels were overlaid on metabolic pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) using PAICE to visually display expression data for easier understanding and interpretation. A total of 156 KEGG pathways were colorized by PAICE. Seventy more pathways and 17,363 more genes were identified using this deep sequencing technology compared to microarray analysis using similar samples [24]. Some of these new identified pathways included anthocyanin, N-glycan and brassinosteroid biosynthesis, citrate cycle, galactose, methane and thiamine metabolism. From the closest related experiment on SR [14], we found about 70% more differentially expressed enzymes at 10 dai using our deep sequencing approach than were identified at 7 dai using microarrays. Even though many probe sets on the soybean array correspond to genes encoding enzymes involved in carbohydrate metabolism, we still found 61% more that were differentially expressed using deep sequencing. This increase of information allowed us to draw better conclusions about what is happening inside the plant during sporulation of the fungus.

Taking into account that we used a time-zero sample as control sample instead of a mock-inoculated sample 10 dai, we need to consider that some changes could be the effect of biological variation in plant development and not an effect of pathogen infection. However, between the inoculation time where plants were three-week-old (21 days) and 31-day-old plants when samples were collected, no major growth steps occurred, such as flowering and seed development. Basically, during this 10-day period, the plant stem would be expected to grow in length and true leaves would emerge and expand. This growth phase involves primary and secondary metabolism including photosynthesis, carbohydrate metabolism, anthocyanin, chlorophyll, and terpene production but also transcription and protein synthesis [25, 26]. However, our analysis assumes that the changes we identified are the effect of pathogen infection.

3.4. Expression of Genes Encoding Enzymes in Carbohydrate and Amino Acid Metabolism Is Decreased

Most downregulated genes were involved in metabolism, specifically in carbohydrate and amino acid metabolism. Globally, genes encoding enzymes of carbohydrate metabolism genes were mostly downregulated while some amino acid metabolism genes were upregulated.

SR infection disrupts the highly organized primary metabolism of the host. Metabolic activities involved at the end of the infection process, when the fungus is ready to develop new spores, are still not clear. In a general plant-biotrophic pathogen interaction, the repression of photosynthesis and the induction of sink metabolism seem to be typical, but the effect on sugar levels varies considerably among different plant-pathogen interactions. Infection of tobacco with potato virus Y or with Phytophthora nicotianae, infection of wheat with Puccinia graminis, and infection of Arabidopsis with Albugo candida all result in an increase in the levels of soluble sugars [2730]. In contrast, sugar levels in Arabidopsis are not altered by infection with Pseudomonas syringae, and they decrease in tomato plants after inoculation with Botrytis cinerea [31, 32] as well as in sunflowers treated with Sclerotinia sclerotiorum [33]. In our experiment, there is a clear downregulation of most genes encoding proteins involved in most carbohydrate metabolism. In our previous microarray experiment, there were many metabolic pathways where no changes in gene expression were identified, especially in amino acid biosynthetic pathways, including those producing valine, leucine, isoleucine, lysine, phenylalanine, tyrosine, and tryptophane metabolism. In contrast, we identified changes in gene expression of enzymes involved in several amino acid biosynthetic pathways from our deep sequencing results. The pathway for arginine and proline metabolism is one of the most populated by PAICE with 30 genes identified (Figure 3). These 30 genes encoded 16 different enzymes. Of these genes, 13 were clearly downregulated, as indicated in red, while four were clearly upregulated and are colored green (Figure 3). Compared to our microarray results [24], where genes encoding for only two enzymes were identified, we clearly enhanced the amount of information related to soybean-soybean rust interaction. Thirteen more genes encoding three different enzymes were indicated in yellow (Figure 3). These genes were present in many copies in the soybean genome, and all copies encoded an enzyme represented by the same EC accession. These gene copies encode different enzyme forms that may be localized to different plant compartments (chloroplast, mitochondria, cytosol, and nucleus) and may not have the same status of regulation. For example, several genes encoded diamine oxidase (EC 1.4.3.22). On the KEGG pathway, the EC number is colored in yellow in arginine and proline metabolism (Figure 3, light arrow) as well as in other metabolic pathways. All the genes encoding the forms of diamine oxidase expressed in the peroxisome and endomembrane system were downregulated, while one gene encoding diamine oxidase expressed in some other cellular components was upregulated. Table 2 depicts three diamine oxidase enzymes encoded by genes that have different gene expression patterns in arginine and proline metabolism. Even though differential gene expression and compartmentalization cannot be related for the other two enzymes, they have been colored in yellow since they did not show the same differential expression. However, genes encoding glutamine synthetase (EC 6.3.1.2) were mostly upregulated, while genes encoding prolyl 4-hydroxylase (E.C. 1.14.11.2) were mostly downregulated bringing us to conclude that this amino acid pathway was mostly downregulated. Genes in other amino acid pathways were also mostly downregulated. Genes encoding enzymes involved in valine, leucine, isoleucine, tyrosine, and phenylalanine biosynthesis were mostly downregulated as well, while, in contrast, genes involved in degradation of these amino acids were mostly upregulated. Genes encoding enzymes involved in alanine, aspartate, glutamate, cysteine, and methionine metabolism were mostly upregulated.

Downregulation of genes encoding enzymes involved in amino acid metabolism correlates with the downregulation of genes involved in protein synthesis. Even though fungi can synthesize some amino acids such as arginine, proline, tryptophan, and histidine, nitrogen for fungal growth is mostly derived from plant sources. These sources could include nitrate, ammonia, amino acids, and other small molecules and proteins [34]. From this fact, we might expect a fungal pathogen to drain plant amino acid supply. Surprisingly, plant amino acid content seems to increase in plants infected by some pathogens such as different fungi as well as nematodes [3537]. However, Polesani et al. [38] observed that some genes involved in amino acid metabolism were prevalently repressed in grapevine infected with Plasmopara viticola except genes encoding enzymes such as cysteine synthase (EC 2.5.1.47) and gamma-glutamylcysteine synthetase (EC 6.3.2.2). The gene encoding cysteine synthase was also upregulated in our dataset, while the gene encoding gamma-glutamylcysteine synthase was not differentially expressed. In addition, the only upregulated genes encoding enzymes involved in amino acid metabolism were those for amino acids not synthesized by fungi but needed for fungal growth including alanine, aspartate, glutamate, tyrosine, and phenylalanine.

3.5. Gene Expression of Enzymes Involved in Lipid Metabolism Is Downregulated

Most genes were downregulated that encoded enzymes involved in lipid metabolism, including glycerolipid, glycerophospholipid, sphingolipid, steroid, and steroid hormone biosynthesis. In our previous microarray results, no gene encoding enzymes involved in glycerophospholipid and sphingolipid metabolism were differentially expressed. Moreover, in steroid and terpenoid backbone biosynthesis, genes encoding for only four and two enzymes, respectively, were identified compared to fifteen and six enzymes using our deep sequencing technique. Farnesyl-diphosphate (PP) is used to synthesize steroids (Figure S2), and its production was directly impaired by a major downregulation of many genes encoding enzymes involved in terpenoid pathway such as phosphomevalonate kinase (EC 2.7.4.2, light arrow) and 4-hydroxy-3-methylbut-2-enyldiphosphate reductase (EC 1.17.1.2, heavy arrow). There is a report showing a decrease in level of total lipids, neutral lipids, and phospholipids during cotton-Verticillium dahliae interaction [39]. This report is in agreement with what we observed for genes encoding enzymes in most metabolic pathways involved in lipid anabolism and catabolism. More specifically, Nes [40] found that sterol biosynthesis was suppressed in plant cell cultures in response to a pathogen or when challenged with fungal elicitors.

While downregulation of genes encoding enzymes involved in steroid biosynthesis seemed to be a direct consequence of downregulation of terpenoid biosynthesis, brassinosteroid biosynthesis using campesterol produced during steroid biosynthesis pathway showed differential expression of most of its encoding genes, but downregulation or upregulation was not clear (Figure S2). All enzymes represented in yellow in this last pathway were represented by the same EC number (EC 1.14.-.-) corresponding to different cytochrome p450 family and subfamily. Four genes encoding one of these enzymes were upregulated, while four others were downregulated. When these enzymes were activated in the endoplasmic reticulum, genes tended to be upregulated, while when the enzymes were activated in other compartments, genes were up-or downregulated. In our microarray results, we failed to identify expressed genes encoding enzymes involved in brassinosteroid biosynthesis.

Brassinosteroids (BRs) are ubiquitously distributed in the plant kingdom and are involved in many different cellular responses such as cell elongation, pollen tube growth, xylem differentiation, leaf epinasty, and root inhibition [41, 42]. Their role in plant resistance to diverse environmental stresses has also been confirmed in many studies, for example, [43, 44]. The potential of BRs to enhance plant resistance specifically against fungal pathogen infection has been documented in many studies, for example, [45, 46]. However, BRs have also been found to stimulate fungal growth and disease progression [47], which means that their role depends on the type of BR, the concentration, and where it is expressed [48]. The majority of the genes involved in BR biosynthesis were differentially expressed (up- and downregulated) in our study, suggesting that BR synthesis is clearly affected during SR infection, but its role is unclear and need to be further explored.

3.6. Changes in Gene Expression Related to Transport Facilitation

Genes involved in transport facilitation were strongly downregulated and contained 14% of the most downregulated genes. These genes mostly encoded aquaporins and a UDP-galactose transporter-related protein. In many cases, infection of plants by pathogens can lead to water stress [49]. Commonly, this stress is a consequence of alteration in stomatal behaviour, disruption of cuticular resistance, or reduction of root and xylem hydraulic conductivity [50]. In plants susceptible to Phytophthora spp., this stress was caused by a decline in stomatal conductance upon infection [51]. In esca-infected grapevine, some aquaporin encoding genes, involved in water transport, were repressed at late stages of the infection [52]. Similarly in our experiment, six aquaporin encoding genes were found among our fifty most downregulated genes. Our microarray results also revealed three genes highly downregulated that encoded aquaporins. Water losses cause by one or a combination of mechanisms previously mentioned may be directly linked to this downregulation of aquaporin genes related to water transport.

3.7. Downregulation of Transcription Factors

We observed many genes involved in transcription in our data set, mostly genes with transcription factor domains. About three-fourth of these were differentially expressed, most of them (425) were downregulated. Expression of the other fourth was stable. A high proportion of the genes containing a transcription factor (TF) domain was related to the largest family of transcription factors that are associated with the stress response that contain the MYB-DNA-binding domain. Many of them tended to be downregulated during the susceptible interaction between soybean and SR (Table S4). Other genes that are downregulated included those encoding transcription factors GT-2 and related proteins containing a trihelix-DNA binding/SANT domain as well as MIKC-type MADS-domain proteins. Many of the upregulated genes encode transcription factors from TALE and CAMTA families. Surprisingly, only few genes encoded WRKY transcription factors which were the family of transcription factor the most expressed in our microarray data.

During development and differentiation, plants need to integrate a wide range of developmental and environmental signals to regulate complex patterns of gene expression. A major level at which gene expression is regulated is the initiation of transcription. Similarly, during biotic and abiotic stresses, a battery of genes is activated by TFs as part of plant defense and stress responses. Accordingly, many TFs involved in stress response were upregulated at the uredinium stage of SR development on soybean. However, most of the TFs found in our experiment were downregulated. Members of the most well-studied subfamilies of MIKC-type MADS-domain proteins function in floral organ identity, in different aspects of flower initiation, or vegetative identity and in ovule and seed formation [53]. SR infection decreases the number of seeds per pod which is one cause for yield losses. Thus, downregulation of genes encoding proteins involved in the control of seed development might be expected.

3.8. Downregulation of Genes Encoding Enzymes Involved in Glycan Biosynthesis

Many genes involved in glycan metabolism were downregulated based according to our deep sequencing results, while no genes encoding enzymes involved in glycan metabolism were differentially expressed based on our previous microarray data. Seven of eight genes encoding enzymes involved in N-glycan biosynthesis were downregulated, while three of eight expressed genes encoding enzymes involved in high-mannose biosynthesis (Figure 4) were downregulated. In the N-glycan biosynthetic pathway, the production of dolichol monophosphate was potentially derived from β-glucosylphosphoryldolichol instead of dolichol diphosphate (PP-Dol), since most isoforms of genes encoding dolichyldiphosphatase (EC 3.6.1.43, arrow) were downregulated. Also, downregulation of most of the genes encoding enzymes involved in the terpenoid backbone biosynthesis pathway may decrease production of PP-Dol (Figure S3). PP-Dol is important to the synthesis of oligosaccharides before their transfer to target proteins.

The biological functions of N-glycan modifications in the Golgi apparatus are well established in humans, insects, and yeasts [5456]. Until recently, plant complex N-glycans have not been associated with essential biological functions in their host plants. Jae et al. [57] recently reported that mutants defective in complex N-glycans show enhanced salt sensitivity, establishing that they are indispensable for certain biological functions; however, there is still no report on the involvement of N-glycans in response to biotic stresses. Our study showed that genes encoding enzymes involved in N-glycan biosynthesis were mostly downregulated as were genes encoding enzymes involved in high-mannose type N-glycan biosynthesis, a separate but related pathway. These results suggest that N-glycans play a role in plant-pathogen interactions, worthy of further investigation.

3.9. Many Genes Encoding Proteins Involved in the Defense Response Are Upregulated

At 10 dai, the majority of the upregulated genes are directly or indirectly related to the defense response. For example, phenylalanine and histidine ammonia-lyases involved in amino acid metabolism are often considered as indirect pathogenesis-related proteins [58]. In our previous microarray results, many genes encoding proteins involved in defense were identified as upregulated, including β-1,3 glucanase, glutathione S-transferase, PR-10, polyphenol oxidase, cysteine protease inhibitor, 12-oxophytodieoate reductase, AOS, phospholipase A2, lipoxygenase 2, and chitinase. Now using deep sequencing, some of these genes were again identified with a similar expression status, such as chitinase, β-1,3 glucanase, glutathione S-transferase, and 12-oxophytodieoate reductase, along with many new ones. Numerous genes encoding enzymes involved in phenylpropanoid and flavonoid biosynthesis were mostly upregulated (Figure S2). Phenylalanine ammonia-lyase (PAL) catalyzes the first reaction of the phenylpropanoid biosynthetic pathway, and PAL is directly involved in plant responses towards biotic and abiotic stimuli [59]. In the phenylpropanoid biosynthetic pathway, all genes encoding PAL (EC 4.3.1.24, light arrow) were upregulated as well as many genes of the pathway, encoding enzymes such as caffeoyl-CoA O- methyltransferases (EC 2.1.1.104, heavy arrow) and peroxidases (EC 1.11.1.7, open arrow), clearly showing that the plant defense response was still active even 10 dai. An increase in transcripts encoding PAL also was reported to occur in susceptible rice seedlings inoculated with spore suspensions of Pyricularia oryzae [60]. In flavonoid biosynthesis, genes encoding dihydroflavonol 4-reductase (EC 1.1.1.219, light arrow) colored in yellow were mostly upregulated as well as genes encoding flavonol synthase (EC 1.14.11.23, open arrow).

PAL is also involved in salicylic acid (SA) biosynthesis. SA is a signaling molecule known to play key roles in various aspects of plant defense signal transduction. Even though PAL is clearly not the key enzyme for SA biosynthesis, it can be important when phenylalanine contributes to SA production in a SID2-independent pathway [6163]. At early stages of an incompatible interaction, there is an increase in SA content which contributes to launching a general defense response against the pathogen [64]. In our experiment, during a compatible interaction at a late stage of infection, a couple of genes encoding enzymes involved in aromatic amino acid production were upregulated, including genes encoding chorismate mutase (EC 5.4.99.5), anthranilate synthase (EC 4.1.3.27), and isochorismate synthase (EC 5.4.4.2) while most of the genes encoding key enzymes of the SA pathway were downregulated. Other genes related to defense were upregulated, such as genes encoding enzymes involved in jasmonic acid and ethylene production but in less abundance (Table S5).

3.10. Many Genes Encoding Proteins Involved in Signal Transduction Are Upregulated

Genes involved in intracellular communication/signal transduction comprised 12% of the most upregulated genes. This group mostly included genes encoding serine/threonine protein kinases, calmodulin, and related proteins from the EF-hand superfamily. Compared to our microarray results where only one gene encoding a serine/threonine protein kinase was downregulated and four genes encoding a calcium-binding protein from the EF-hand superfamily were mixed in expression, some up and some downregulated, our results from deep sequencing were different. Because many metabolic changes occur in infected plants, we expected many signal transduction pathways to be expressed to modulate gene expression. The network of protein serine/threonine kinases in plant cells appears to act as a “central processor unit” that sense external factors and initiate appropriate changes in metabolism, gene expression, and cell growth and division [65]. Cysteine proteinases fill multiple functions in plant physiology and development, including playing a role in signaling pathways and in the response to biotic and abiotic stresses [66, 67]. Thus, it was not surprising to find genes encoding serine/threonine protein kinases and cysteine proteinases among the most upregulated genes.

3.11. Confirmation of Differential Gene Expression by Quantitative PCR

Quantitative PCR was conducted using ten representative genes showing a relatively wide range of fold-changes in our deep mRNA-Seq dataset. The trend of up- or downregulation using both techniques was mostly retained except for genes encoding anthranilate synthase and thioredoxin (Table S6). Differences in levels of expression between mRNA-Seq and quantitative PCR analyses may come from the cDNA PCR enrichment step in the mRNA-Seq procedure specifically for genes that are in high abundance in one sample and in low abundance in the other sample. For example, the gene encoding a predicted NUDIX hydrolase FGF-2 and related proteins was in high abundance at 10 dai while it was in low abundance at time-zero. The PCR enrichment step makes the high abundance transcript appear to be even more abundant while the low abundance transcript still stays low. Thus, we observed a 22 fold-changes by mRNA-Seq compared to an average of 5 fold-changes using qPCR. For the gene encoding anthranilate phosphoribosyltransferase, qPCR results agreed with the downregulation trend observed in mRNA-Seq for two of the three biological replicates. The third replicate showed an upregulation in qPCR, while a downregulation was indicated using mRNA-Seq. Differences between replicates were sometime large, but this is due to differences in infection from plant to plant. Gene expression level of α-tubulin gene from SR [14] in three different biological replicates also showed large differences between replicates, again probably due to differences in infection from plant to plant. Differences at the level of expression between these two methods have been reported previously in other studies. Wu et al. [68] defined the correlation between mRNA-Seq and qPCR assays as modest. Zenoni et al. [69] validated 15 genes by qPCR and found that 12 were in complete agreement with mRNA-Seq data. Thus, our data is similar and indicate that general trends are usually maintained between the two procedures, but some differences do occur.

4. Conclusion

Next generation sequencing platforms, such as Illumina, SOLiD, 454, Helicos, allow deeper analyses of the transcriptome than microarrays. A mRNA-Seq strategy using the Illumina platform provided us with an in-depth view of the molecular events occurring in soybean plants attacked by SR. Our study indicates that susceptible soybean plants at a late stage of infection with SR are significantly impaired in their primary metabolism as shown by a massive downregulation of genes involved in carbohydrate, amino acid, and lipid metabolism. However, some metabolic pathways are upregulated. The combination of upregulated metabolism and up-regulation of basal defense by the plant even at this late stage of infection suggests that the plant is still attempting to ward off pathogen attack so it can survive. Nevertheless, due to its biotrophic life style, SR attempts to manipulate and reprogram its host’s metabolism. Key enzymes involved in primary metabolism shown here to be up- or downregulated need further study to understand their role during fungal infection. The use of mRNA-Seq has yielded data encompassing more genes than were identified from our microarray experiments [24]. This data may be useful in engineering plants to counteract infection by SR through overexpression or silencing genes. An interesting target could be genes encoding MIKC-type MADS-domain proteins. Overexpression of MIKC-type MADS-domain genes may restore seed development in infected plants and avoid yield losses due to the infection process. Taking into account the importance of the right balance of sugar at different stages of fungal growth and the actual massive decrease in expression of genes involved in carbohydrate metabolism at 10 dai, another target could be genes encoding key enzymes of carbohydrate metabolism, such as glycolysis, and determine their effect on fungal sporulation. Also, mRNA-Seq brings us new avenues to explore such as the importance of brassinosteroid biosynthesis in resistance and N-glycan biosynthesis in pathogenesis. Many genes found in this experiment provide important targets to develop a broader resistance to SR by using overexpression and gene silencing tools.

Acknowledgments

The authors thank Alicia Beavers for her excellent technical assistance and Ann Smigocki, Richard Joost, Peggy MacDonald, and John Hammond for their careful critical review of the paper. The authors gratefully acknowledge support from United Soybean Board project number T0258.

Supplementary Materials

The Supplementary Material contains tables describing qPCR primer sequences (Supplementary Table 1), the fifty most up and down-regulated genes at 10 dai compare to time-zero (Supplementary Table 2), the complete deep sequencing gene expression data of soybean Williams 82 before infection (time-zero) with soybean rust and 10 dai (Supplementary Table 3), the diversity of proteins containing transcription factor domains encodes by down-regulated and up-regulated genes at 10 dai compared to time-zero (Supplementary Table 4), the expression level of genes involved in salicylic acid, jasmonic acid and ethylene biosynthesis at 10 dai compared to time-zero (Supplementary Table 5), the comparison of gene expression using qPCR and deep sequencing (Supplementary Table 6) and three figures representing the distribution of genes expressed at 10 dai and time-zero (Supplementary Figure 1), and diverse metabolic pathways where genes were differentially regulated at time-zero and 10 dai (Supplementary Figure 2; Steroid biosynthesis, Terpenoid backbone biosynthesis and Brassinosteroid biosynthesis and Supplementary Figure 3; Phenylpropanoid biosynthesis and Flavonoid biosynthesis).

  1. Supplementary Materials