Abstract

The module network method, a special type of Bayesian network algorithms, has been proposed to infer transcription regulatory networks from gene expression data. In this method, a module represents a set of genes, which have similar expression profiles and are regulated by same transcription factors. The process of learning module networks consists of two steps: first clustering genes into modules and then inferring the regulation program (transcription factors) of each module. Many algorithms have been designed to infer the regulation program of a given gene module, and these algorithms show very different biases in detecting regulatory relationships. In this work, we explore the possibility of integrating results from different algorithms. The integration methods we select are union, intersection, and weighted rank aggregation. Experiments in a yeast dataset show that the union and weighted rank aggregation methods produce more accurate predictions than those given by individual algorithms, whereas the intersection method does not yield any improvement in the accuracy of predictions. In addition, somewhat surprisingly, the union method, which has a lower computational cost than rank aggregation, achieves comparable results as given by rank aggregation.

1. Introduction

There is a complex mechanism in cells that controls which genes are expressed. Transcription factors, which are a special type of protein and capable of regulating the expression of other genes by binding to their upstream regions, play a crucial role in this mechanism. Transcriptional regulatory relationships between transcription factors and their targets can be represented by a network, called a transcription regulatory network, where each vertex denotes a gene and each edge denotes a regulatory relationship.

Identifying transcription regulatory networks is critical, because it facilitates understanding biological processes in cells. Gene expression data, which measure mRNA levels of genes, are widely used for inferring transcription regulatory networks [1]. Bayesian networks, a type of probabilistic graphical model [2], have been proposed to infer transcription regulatory networks from gene expression data [3, 4]. Despite their success in learning regulatory networks, models inferred by the Bayesian networks tend to overfit the data because, in a gene expression dataset, the number of variables is normally large compared to the number of samples. To cope with this problem, Segal et al. [5] designed the module network method, which is a special type of Bayesian network algorithm. In this method, each module represents a set of variables that share (1) a single variable or a set of variables as their parents and (2) local distributions. Compared to standard Bayesian network algorithms, this design significantly reduces the number of parameters to be learned and consequently leads to more accurate inferences. The module network method has yielded promising results in learning regulatory networks [5–7].

Given a gene expression dataset and a list of candidate transcription factors, the process of learning module networks consists of two tasks: clustering genes into modules and inferring the regulation program (transcription factors) of each module. Segal et al. [5] designed an expectation-maximization-based [8] learning algorithm that alternates between these two tasks. Moreover, Joshi et al. [9] separated the two tasks, where they grouped genes into modules before learning the regulation program of each module. Experimental results showed that the separation improves the performance of the module network method in inferring regulatory networks.

Many techniques have been applied to infer the regulation program of a given gene module, that is, the second task in learning module networks, such as logistic regression [9], moderated 𝑑-statistics [10] from LIMMA [11], Gibbs sampling [12], and linear regression [13]. A common characteristic of these methods is that they are able to calculate the confidence (i.e., regulatory score) for the assignment of a transcription factor to a gene module, which is referred to as a regulator-module interaction. Consequently, their results can be sorted into an ordered list of regulator-module interactions according to their regulatory scores. The higher the ranking of a regulator-module interaction in the ordered list given by a method, the more confidence this method assigns to the interaction. In addition, since these methods resort to distinct techniques, they show very different biases in detecting regulatory relationships. For example, in the nitrogen utilization module in yeast, LeMoNe [9] favors regulatory relationships where transcription factors and genes are globally coexpressed, while the LIMMA-based method [10] favors regulatory relationships where transcription factors and genes are locally coexpressed. This suggests that integrating results from different regulation program learning algorithms can be a promising direction in better inferring regulatory networks [14].

In this work, we extend our previous work [10] by integrating its results with those given by two other learning algorithms [9, 13]. To the best of our knowledge, this is the first of such an attempt. The integration methods we select are union, intersection, and weighted rank aggregation [15]. Experimental results indicate that the union and weighted rank aggregation methods produce more accurate predictions than those given by individual algorithms, whereas the intersection method does not yield any improvement in the accuracy of predictions.

The rest of this paper is organized as follows: Section 2 describes the dataset, integration methods, and regulation program learning algorithms studied in this work. Section 3 presents experimental results. Section 4 summarizes the main results and discusses future work.

2. Material and Method

2.1. Data Set and Reference Database

The yeast stress dataset has been used as a benchmark to validate the performance of module network learning algorithms [5, 9]. This dataset consists of 173 arrays and measures the budding yeast’s response to a panel of diverse environmental stresses [16]. The conditions covered by the dataset consist of temperature shocks, amino acid starvation, nitrogen source depletion, and so on. In previous work [5, 17], 2355 differentially expressed genes in the dataset were selected and these genes were clustered into 69 gene modules. Genes in the modules show functional enrichment. For example, a module for nitrogen utilization was obtained. This module consists of 47 genes mostly involved in two pathways: the methionine pathway (regulated by MET28 and MET32), and the nitrogen catabolite repression (NCR) system (regulated by GLN3, GZF3, DAL80, and GAT1). Both pathways relate to the process by which the budding yeast uses the best available nitrogen source in the environment [18, 19].

In this work, we apply three algorithms [9, 10, 13] to infer regulators of these modules using a list of 321 transcription factors prepared by Segal et al. [5] as candidate transcription factors. Then, we integrate the results of these algorithms by methods described in Section 2.2. The regulatory relationships recorded in YEASTRACT [20] (released on April 27, 2009) are used as the reference database to validate results given by individual algorithms and our integration methods.

2.2. Integration Methods

We apply union, intersection, and weighted rank aggregation integration methods to integrate results from different regulation program learning algorithms. The union and intersection methods are straightforward. The former determines the ranking of a regulator-module interaction using the highest ranking given by all candidate learning algorithms. In contrast, the latter determines the ranking of an interaction using the lowest ranking. For example, given a regulator-module interaction, which is the 1st, 3rd, and 5th items in rankings given by three individual learning algorithms, respectively, the union method assigns the 1st as its ranking, while the intersection method assigns the 5th as its ranking. After determining the ranking of each interaction, these methods can each produce an ordered list of regulator-module interactions by sorting interactions by their rankings.

In comparison, the weighted rank aggregation method [15] is much more computationally intensive than the union and intersection methods. Given a set of learning algorithms 𝑀, this integration algorithm searches for an ordered list π›Ώβˆ—, that is, simultaneously as close as possible to the list produced by each algorithm in 𝑀. Let πΏπ‘š=(π΄π‘š1,π΄π‘š2,…,π΄π‘šπ‘˜) represent an ordered list of π‘˜ regulator-module interactions produced by the algorithm π‘š. Let π‘Ÿπ‘š(𝐴) denote the rank of the interaction 𝐴 under π‘š. Finally, let π‘š(𝑖) (𝑖=1,2,…,π‘˜) denote the 𝑃 value (weight) that algorithm π‘š assigns to the interaction ranked at the 𝑖th position in the ordered list. This can be represented by the following minimization problem: π›Ώβˆ—=argminΞ¦(𝛿),(1) where Φ(𝛿)=π‘šβˆˆπ‘€π‘‘ξ€·π›Ώ,πΏπ‘šξ€Έ(2) represents the sum of the distances between an ordered list 𝛿 and the lists from all algorithms. The distance between 𝛿 and πΏπ‘š is determined by the weighted Spearman’s footrule distance: 𝑑𝛿,πΏπ‘šξ€Έ=ξ“π΄βˆˆπΏπ‘šβˆͺ𝛿||π‘šξ€·π‘Ÿπ›Ώ(𝐴)βˆ’π‘š(π‘Ÿπ‘š(||Γ—||π‘Ÿπ΄))𝛿(𝐴)βˆ’π‘Ÿπ‘š||.(𝐴)(3) To determine π›Ώβˆ—, we apply the cross-entropy Monte Carlo algorithm [21].

2.3. Regulation Program Learning Algorithms

We select LeMoNe [9], Inferelator [13], and the LIMMA-based method [10], as candidate regulation program learning algorithms. In this section, we describe how to apply these algorithms to the yeast stress dataset. In addition, in order to apply the weighted rank aggregation to integrate their results, for each algorithm, we also define how to calculate the 𝑃 value for the assignment of a regulator to a module.

2.3.1. LeMoNe

For each gene module in the yeast stress dataset, LeMoNe [9] sampled 10 regression trees and then calculated regulatory scores for assigning transcription factors to this module based on these trees. Regulatory scores of all regulator-module interactions were downloaded from the supplementary website of [9].

We calculate the LeMoNe-based 𝑃 value for the assignment of a regulator π‘Ÿ to a module as follows. First, given a regression tree 𝑇 of this module, we define the 𝑃 value of the split with π‘Ÿ and a splitting value 𝑧 at an internal node 𝑑 in 𝑇 (i.e., 𝑃value(𝑑)(π‘Ÿ,𝑧)) as the probability of observing a split with a higher average prediction probability than this split at the node 𝑑. The average prediction probability of a split is defined as in equation (4) of [9]. Then, the 𝑃 value for assigning π‘Ÿ to 𝑑 is defined as: 𝑃value(𝑑)𝑀(π‘Ÿ)=𝑑||𝑍||ξ“π‘§βˆˆπ‘π‘ƒvalue(𝑑)(π‘Ÿ,𝑧),(4) where 𝑀𝑑 is the number of experimental conditions in 𝑑 divided by the total number of conditions in the data, and 𝑍 represents the set of possible splitting values for π‘Ÿ in 𝑑. Furthermore, given a set of regression trees 𝐹, the LeMoNe-based 𝑃 value for assigning π‘Ÿ to this module can be calculated as: 1𝑃value(π‘Ÿ)=||𝐹||ξ“π‘‡βˆˆπΉξ“π‘‘βˆˆπ‘‡π‘ƒvalue(𝑑)(π‘Ÿ).(5)

2.3.2. Inferelator

Inferelator [13] uses linear regression and variable selection to identify transcription factors of gene modules. In each gene module in the yeast stress dataset, we fit a linear model to the mean of the module’s genes in each condition using the 321 candidate transcription factors as predictor variables. The regulatory score for assigning a regulator to the module is decided by the absolute value of the regulator’s regression coefficient in the fitted model.

The Inferelator-based 𝑃 value for the assignment of a regulator to a module is defined as follows. First, we permute the values of the expression value matrix from the row direction (gene). Second, we apply Inferelator to the permuted dataset using the original gene modules. Third, we fit the distribution of nonzero coefficients obtained from the permuted dataset by the Weibull distribution defined as: π‘˜pdf(π‘₯)=πœ†ξ‚€π‘₯πœ†ξ‚π‘˜βˆ’1π‘’βˆ’(π‘₯/πœ†)π‘˜,(6) with π‘˜=0.889 and πœ†=0.015 (Figure 1). Last, we define the Inferelator-based 𝑃 value for a regulator-module interaction with a regulatory score 𝑆 as the probability of observing a value more than 𝑆 from the Weibull distribution (6).

2.3.3. LIMMA-Based Method

In our previous work [10], moderated 𝑑-statistics proposed in LIMMA [11] were applied to infer transcription factors of gene modules. For each gene module in the yeast dataset, ten condition clusterings were sampled by a two-way clustering algorithm [17]. Then the regulatory score for assigning a transcription factor to this module was calculated by summing the transcription factor’s standardized moderated 𝑑-statistics based on the sampled condition clusterings.

We next describe how to define the method’s 𝑃 value for the assignment of a transcription factor to a module. First, we randomly generate ten condition clusterings, each of which consisted of two clusters. We then calculate the regulatory score for each candidate transcription factor based on these randomly generated clusterings. Moreover, we record the regulatory score of a randomly selected transcription factor. The above process is repeated to obtain 100,000 randomly generated regulatory scores. Last, the probability density function of these randomly generated scores is approximated by the stretched exponentials [22] defined as: ξ‚»β„Žpdf(π‘₯)=maxξ€Ίexpβˆ’π‘π‘Ÿξ€·π‘₯βˆ’π‘₯maxξ€Έπ‘π‘Ÿξ€»,forπ‘₯β‰₯π‘₯max,β„Žmaxξ€Ίexpβˆ’π‘π‘™ξ€·π‘₯maxξ€Έβˆ’π‘₯𝑐𝑙,forπ‘₯<π‘₯max,(7) with β„Žmax=0.127, π‘π‘Ÿ=0.024, 𝑏𝑙=0.083, π‘π‘Ÿ=2.45, 𝑐𝑙=1.70, and π‘₯max=βˆ’0.050. As shown in Figure 2, the approximated fit is very close to the empirical distribution of the randomly generated regulatory scores, so the 𝑃 value for a regulator-module interaction with a regulatory score 𝑆 can be defined as the probability of observing a value more than 𝑆 from the approximated fit.

Note that the assignment of a regulator to a module is associated with a 𝑃 value for each regulation program learning algorithm, and the 𝑃 value is required by the weighted rank aggregation method to integrate results from different learning algorithms. In contrast, the assignment is also associated with a 𝑃 value based on records in the reference database, YEASTRACT. This 𝑃 value is calculated by the hypergeometric distribution and is used to evaluate the performance of individual learning algorithms and integration methods.

3. Results and Discussion

3.1. Results of Individual Learning Algorithms

We applied each regulation program learning algorithm described in Section 2.3 to calculate the regulatory score for assigning a regulator to a module. Then we sorted all of its regulatory scores between 321 candidate transcription factors and 69 modules in descending order. This led to an ordered list of 22,149 regulator-module interactions for each method.

In addition, for each regulator-module interaction, we used the hypergeometric distribution to calculate the 𝑃 value of this interaction, using regulatory relationships in YEASTRACT as the reference database. This 𝑃 value is based on the number of genes regulated by the regulator in the dataset, the number of genes regulated by the regulator in the module, and the number of genes in the module, and is used to determine if the regulator-module interactions is a true positive.

Moreover, for a given ordered list of regulator-module interactions, we define the precision of the top 𝑖 items in this ordered list as: 𝑃(𝑖)=𝑇(𝑖)𝑖,(8) where 𝑇(𝑖) denotes the number of interactions with 𝑃 values less than 0.05 in the top 𝑖 items (i.e., the number of true positives among these 𝑖 interactions).

In Figure 3, we show the precision of the top 𝑖 regulator-module interactions (𝑖=1,2,…,100) in the ordered lists obtained by Inferelator, LeMoNe, and the LIMMA-based method. When less than 20 interactions are selected, the LIMMA-based method outperforms the other two methods. Most true positives given by LIMMA-based method are for the module of nitrogen utilization. However, Inferelator and LeMoNe outperform the LIMMA-based method when the number of selected interactions is in the range of 20 and 50. In addition, when more than 50 interactions are selected, the three methods show similar performance in the yeast dataset.

3.2. Results for the Weighted Rank Aggregation

The weighted rank aggregation method searches for a synthesized list that is simultaneously as close as possible to the ordered lists from LeMoNe, Inferelator, and the LIMMA-based method. However, it is not feasible to directly apply this integration method on a list with 22,149 interactions due to the extensive computational workload. Hence, we resort to a tradeoff by integrating the top π‘˜ (π‘˜β‰€22,149) interactions in the ordered lists given by these algorithms. This is, for a given ordered list and π‘˜, interactions ranked lower than π‘˜ (i.e., π‘˜+1,π‘˜+2,…,22,149), are associated with a same weight (𝑃 value) of one. The larger π‘˜, the closer the list produced by the rank aggregation is to the lists given by the three candidate algorithms, but the rank aggregation costs more computation time. For example, it takes 12 hours and 48 hours for π‘˜=75 and π‘˜=100, respectively, on a HP Rackmount server with AMD Opteron processors (Γ—86, 64 bit, dual core) and 16 GB memory.

In order to select a proper value for π‘˜ in the yeast dataset, we applied the rank aggregation ten times for π‘˜=25,50,75,100, respectively. For each π‘˜, this led to 10 ordered lists, and we calculated the average of the precision of the top 𝑖 (𝑖=1,2,…,π‘˜) interactions in these ten lists. As shown in Figure 4, when π‘˜ increases from 25 to 50 and then to 75, the precisions obtained by the rank aggregation method are improved, but the precisions at π‘˜=75 and π‘˜=100 are about the same. This indicates that after π‘˜ reaches 75, considering more interactions from the ordered lists of the candidate algorithms can no longer improve the performance of the rank aggregation method. Hence, π‘˜ is set to 100 in our tests using the yeast dataset.

3.3. Comparison of Integration Methods and Individual Algorithms

In this section, we compare the performance of integration methods with individual algorithms. In order to make the comparison clear, for a given 𝑖 (𝑖=1,2,…,100), we define the baseline precision as that obtained by selecting the maximum of the precisions of the top 𝑖 interactions given by all individual algorithms. That is, given a set of individual learning algorithms 𝑀, it is determined as: π‘ƒβˆ—(𝑖)=maxπ‘šβˆˆπ‘€ξ€·pπ‘šξ€Έ(𝑖),(9) where π‘π‘š(𝑖) denotes the precision of top 𝑖 interactions in the ordered list given by algorithm π‘š (8). Note that baseline precision represents an upper optimistic bound that cannot be achieved by individual algorithms as we can only use one of them at a time. Hence, even if the precision obtained by an integration method is only comparable to baseline precision, it still shows that this integration method yields a better overall performance than those of individual algorithms.

We compare the precision of the top 100 predictions from the three integration methods with baseline precisions (Figure 5). The union and rank aggregation methods generate better or similar results compared to the baseline precision. In addition, somewhat surprisingly, the union method, which has a lower computational cost than rank aggregation, achieves comparable results as given by rank aggregation. The first twenty interactions from union and rank aggregation are shown in Tables 1 and 2, respectively.

On the other hand, we observe that baseline precisions are generally better than precisions given by the intersection method. The intersection method sorts interactions by their lowest rankings from all candidate algorithms, so it tends to assign interactions with moderate confidences from all algorithms with high ranks. We speculate that this may have affected its performance. For example, as shown in Table 3, its first 20 interactions include several not highly ranked by any algorithm, such as the twelfth interaction (RDS2 to module 16) ranked 219th, 125th, and 273rd by the LIMMA-based method, LeMoNe, and Inferelator, respectively; and the eighteenth interaction (DAL81 to module 58) ranked as 199th, 310th, and 190th by the LIMMA-based method, LeMoNe, and Inferelator, respectively.

We also compare areas under precision curves for the top 100 predictions given by the integration methods and individual learning algorithms (Table 4). The union and weighted rank aggregation methods achieve better results than those from the individual learning algorithms, but the intersection method only yields a comparable result with the individual learning algorithms. These results indicate that we should be cautious to apply the intersection method to integrate results from algorithms of different natures.

4. Conclusions

A metalearner approach was applied to infer transcription factors of coexpressed gene modules in a yeast stress dataset, with the regulatory relationships recorded in YEASTRACT as the gold standard. We integrated the predictions of three existing inference techniques [9, 10, 13] by three different methods: union, intersection, and weighted rank aggregation. Experimental results show that integrated predictions based on union or rank aggregation have higher precision than any of the individual methods. The justification of this work is that the results generated by different algorithms are not identical and often have clearly different influences from the datasets used. The experiments confirm our expectation that integrating the output of several algorithms results in higher quality predictions. To the best of our knowledge, this is the first such an attempt. Consequently, this work may point out a promising direction for module network learning.

An interesting extension of this work is to investigate if integrating results from more algorithms can lead to even better performance. In particular, we expect that when more algorithms are combined, we may see significant difference between the union and weighted rank aggregation methods. The experiments in this work are conducted on a yeast dataset, and results are validated by the regulatory relationships recorded in YEASTRACT, which does not represent a complete reference database of the regulatory network in the yeast. Hence, another direction for future work is to perform experiments on expression data from other species (e.g., E. coli [23]) to verify if results are consistent with those we obtained in the yeast dataset. In addition, we are interested in performing experiments on synthetic datasets (e.g., DREAM [24]), where complete reference networks are available.