Research Article  Open Access
Natsu Nakajima, Tatsuya Akutsu, "Exact and Heuristic Methods for Network Completion for TimeVarying Genetic Networks", BioMed Research International, vol. 2014, Article ID 684014, 13 pages, 2014. https://doi.org/10.1155/2014/684014
Exact and Heuristic Methods for Network Completion for TimeVarying Genetic Networks
Abstract
Robustness in biological networks can be regarded as an important feature of living systems. A system maintains its functions against internal and external perturbations, leading to topological changes in the network with varying delays. To understand the flexibility of biological networks, we propose a novel approach to analyze timedependent networks, based on the framework of network completion, which aims to make the minimum amount of modifications to a given network so that the resulting network is most consistent with the observed data. We have developed a novel network completion method for timevarying networks by extending our previous method for the completion of stationary networks. In particular, we introduce a double dynamic programming technique to identify change time points and required modifications. Although this extended method allows us to guarantee the optimality of the solution, this method has relatively low computational efficiency. In order to resolve this difficulty, we developed a heuristic method for speeding up the calculation of minimum least squares errors. We demonstrate the effectiveness of our proposed methods through computational experiments using synthetic data and real microarray gene expression data. The results indicate that our methods exhibit good performance in terms of completing and inferring gene association networks with timevarying structures.
1. Introduction
Computational analysis of gene regulatory networks is an important topic in systems biology. A gene regulatory network is a collection of genes and their correlations and causal interactions. It is often represented as a directed graph in which the nodes correspond to genes and the edges correspond to regulatory relationships between two genes. Gene regulatory networks play important roles in cells. For example, gene regulatory networks maintain organisms through protein production, response to the external environment, and control of cell division processes. Therefore, deciphering gene regulatory network structures is important for understanding cellular systems, which might also be useful for the prediction of adverse effects of new drugs and the detection of target genes for the development of new drugs. In order to infer gene regulatory networks, various kinds of data have been used, such as gene expression profiles (particularly mRNA expression profiles), CHromatin ImmunoPrecipitation (ChIP)chip data for transcription binding information, DNAprotein interaction data, and proteinprotein interaction data [1–3]. However, many existing studies have focused on the use of gene expression profiles, because expression data from a large number of genes can be simultaneously observed due to developments in DNA microarray technology [1–3]. Various mathematical models and computational methods have been applied and/or developed to infer gene regulatory networks from gene expression profiles, which include Boolean networks [4, 5], Bayesian networks [6, 7], dynamic Bayesian networks [8], differential equations [9, 10], and graphical Gaussian models [11]. In Boolean networks, the state of each gene is simplified into 0 or 1 and the gene regulation rules are given as Boolean functions, where 0 and 1 mean that a gene is active (in high expression) and inactive (in low expression), respectively. In the most widely used Boolean network model, it is assumed that the states of genes change synchronously according to discrete time steps. In Bayesian networks, the states of genes are usually classified into discrete values and the gene regulation rules are given in the form of conditional probabilities. Although standard Bayesian networks can only handle static data and acyclic networks, dynamic Bayesian networks can handle time series data and cyclic networks. In differential equation models, the dynamics of gene expression are represented by a set of linear or nonlinear equations (one equation per gene). In graphical Gaussian models, partial correlations are used as a measure of independence of any two genes, by which direct interactions are distinguished from indirect interactions. For details of these models and methods, see review/comparison papers [1–3].
These network models assume that the topology of the network does not change through time, whereas the real gene regulatory network in the cell might dynamically change its structure depending on time, the effects of certain shocks, and so forth. Therefore, many reverse engineering tools have recently been proposed, which can reconstruct timevarying biological networks based on timeseries gene expression data. Yoshida et al. [12] developed a dynamic linear model with Markov switching that represents change points in regimes that evolve according to a firstorder Markov process. Fujita et al. [13] proposed a method based on the dynamic autoregressive model. This model extends the vector autoregression (VAR) model, which can be applied to the inference of nonlinear timedependent biological correlations such as dynamic gene regulatory networks. Robinson and Hartemink [14] proposed a model called a nonstationary dynamic Bayesian network, based on dynamic Bayesian networks, which allows inference from data generated by nonstationary processes in a timedependent manner. Lèbre et al. [15] also introduced the autoregressive timevarying (ARTIVA) algorithm for the analysis of timevarying network topologies from time course data, which is generated from different processes. This model adopts a combination of reversible jump Markov chain Monte Carlo (RJMCMC) and dynamic Bayesian networks (DBN), in which RJMCMC is used for the identification of change time points and the resulting networks, and DBN is used to represent causal interactions among genes. Thorne and Stumpf [8] presented a method to model the regulatory network structure between distinct segments with a set of hidden states by applying the hierarchical Dirichlet process hidden Markov model [16], including a potentially infinite number of states and a Bayesian network model for estimating relationships between genes. Rassol and Bouaynaya [17] presented a new method based on constrained and smoothed Kalman filtering, which is capable of estimating timevarying networks from timeseries data, including unobserved and noisy measurements. The dynamics of genetic modules are represented as a linearstate space equation and the observability of linear timevarying systems is defined by imposing sparse constraints in Kalman filters. Ahmed et al. [18] proposed an algorithm called Tesla with machine learning, which can be cast in the form of a convex optimization problem. The basic assumption in this method is that networks at close time points do not have significant topological differences but have common edges with high probability; in contrast, networks at distant time points are markedly different. The regulatory networks are represented by Markov random fields at arbitrary time intervals.
As mentioned above, there have been many studies and attempts to analyze both timeindependent and timedependent networks from timeseries expression data; however, gene regulatory systems in living organisms are so complicated that any mathematical model has limitations and there is not yet a standard or established method for inference, even for timeindependent networks. One of the possible reasons is that there exists an insufficient number of highquality timeseries datasets to reconstruct the dynamic behavior of the network. In other words, it is difficult to reveal a correct or nearly correct network based on a small amount of data that includes some noise. Hence, in our recent study, we proposed a new approach for the analysis of timeindependent networks, called network completion [19, 20], in which the minimum amount of modifications are made to a given network so that the resulting network is most consistent with the observed data. Similar concepts have been independently proposed [21–24]. In addition, network completion can be applied to inference of networks by starting with the null network.
In this paper, we present two novel methods for the completion and inference of timevarying networks using dynamic programming and least squares fitting (DPLSQ): DPLSQTV (DPLSQTV was presented in a preliminary version of this paper [25]; however, in this paper, more detailed computational experiments are performed and DPLSQHS is newly introduced) and DPLSQHS, where TV and HS stand for time varying and heuristics. DPLSQTV is an extension of DPLSQ [20] such that it can identify the time points at which the structure of the gene regulatory network changes. Since the additions and deletions of edges are basic modifications in network completion, we need to extend DPLSQ so that these operations can be performed at several time points. In DPLSQTV, these edges and time points are identified by a novel double dynamic programming method in which the inner loop is used to identify static network structures and the outer loop is used to determine change points. It is to be noted that a single dynamic programming (DP) method was used in our previous work on the completion and inference of timeindependent networks [20], whereas a double DP method is employed here in order to cope with timevarying networks. Our proposed methods also allow us to find an optimal solution in polynomial time if the maximum indegree (i.e., the maximum number of input genes to a gene) is bounded by a constant. Although DPLSQTV is guaranteed to find an optimal solution in polynomial time, the degree of the polynomial is not low, which prevents the method from being applied to the completion of large networks. Therefore, we further propose a heuristic method, called DPLSQHS, to speed up the calculation of the minimum least squares error by applying restriction constraints that limit the number of combinations of incoming nodes.
We evaluate the efficiency of our methods through computational experiments using synthetic data and microarray gene expression data from the life cycle of D. melanogaster and the cell cycle of S. cerevisiae. We also demonstrate the effectiveness of the proposed methods by comparing our results with those of ARTIVA [15].
2. Method
In this section, we present DPLSQTV, a DPbased method for the completion of a timevarying network. We assume that there exist time points , which are divided into intervals: , , where indicates the number of change points. A different network is associated with each interval. We assume that the set of genes does not change; therefore, only the edge set changes according to the time interval. Let be the set of genes. Let be the initial set of directed edges (i.e., initial set of gene regulation relationships), and let be the sets of directed edges (i.e., the output), where denotes the edge set for the th interval.
Then, the problem is defined as follows: given an initial network consisting of genes, time series datasets, each of which consists of time points for genes and the positive integers , , and , infer change points (i.e., ) and complete the initial network by adding edges and deleting edges in total such that the total leastsquares error is minimized. This results in the set of edges at the corresponding time intervals (see Figure 1). It is to be noted that if we start with an empty set of edges (i.e., ), the problem corresponds to the inference of a timevarying network.
2.1. Model of Differential Equation and Estimation of Parameters
We assume that the dynamics of each node are determined by the following differential equation: where corresponds to the expression value of node , denotes random noise, and are incoming nodes to . The second and third terms of the righthand side of the equation represent the linear and nonlinear effects to node , respectively (see Figure 2), where a positive value for or corresponds to an activation effect, and a negative value for or corresponds to an inhibition effect. This model is an extension of the linear differential equation model [3]. It is also a variant of the recurrent neural network model [27], although the sigmoid function is replaced here by an identify function and nonlinear terms representing cooperating regulations are added instead.
In practice, we replace the derivative of (1) by the difference and ignore the noise term as follows: where denotes the unit time. This kind of discretization is also employed for linear and recurrent neural network models [3, 27].
In our previous method DPLSQ [20], we assume that time series data s, which correspond to s in (2), are given for , where we distinguish an observed expression value from an expression value in the mathematical model equation (2). Then, the parameters s and s are estimated from these time series data by minimizing the following objective function (i.e., the sum of the least squares errors) for each node : It should be noted that is the observed expression value of gene at time , and are tentative incoming nodes to node . Incoming nodes to each node are determined so that the sum of these values for all nodes is minimized under the constraint that the total number of edges is equal to the specified number. In order to minimize the sum of least squares errors for all genes along with determining the incoming nodes and corresponding parameters, DP is applied. Readers are referred to [20] for the details of DPLSQ.
2.2. Completion by Addition of Edges
In this subsection, we present our proposed method for network completion of timevarying networks by the addition of edges and extend this to a general case (i.e., network completion by the addition and deletion of edges) in the following subsection. For simplicity, we assume , where we can extend the method to the case of by changing the definition of only.
We assume that the set of nodes (i.e., the set of genes) and the set of initial edges are given. Let the current set of incoming nodes to be . We define the least squares error for during the time period between and as where denotes the observed expression value of gene at time . The parameters (i.e., , , ) needed to attain this minimum value can be computed by a standard least squares fitting method.
Because network completion is considered to involve the addition of edges, let be the set of initial incoming nodes to . Let denote the minimum least squares error when adding edges to the th node during the time from to , which is formally defined as where each must be selected from . In order to avoid combinatorial explosion, we constrain the maximum to be a small constant, , and let , for or .
Then, the problem is stated as where and .
Here, we define as
The entries of can be computed by the following DP algorithm:
It is to be noted that is determined uniquely regardless of the ordering of nodes in the network. The correctness of this DP algorithm can be seen as follows:
Next, we define as where . can be computed by the following DP algorithm:
The introduction of and the corresponding DP procedure are the methodologically novel points of this work, compared with our previous work [20].
The correctness of this DP algorithm can be seen as follows:
2.3. Completion by Addition and Deletion of Edges
The above DP procedure can be modified for the deletion of edges and for the addition and deletion of edges as in DPLSQ [20]. Since the former case is a subcase of the latter one, we describe only the latter one (addition and deletion of edges) here.
Let denote the minimum least squares error for the time period between and when adding edges to and deleting edges from , where the added and deleted edges must be disjointed. We constrain the maximum and to the small constants and . We let if , , , or hold. Then, the problem is stated as Here, we define as
As in the previous subsection, can be computed by
Next, we define as
can be computed by the following DP algorithm:
2.4. Time Complexity Analysis
In this subsection, we analyze the time complexity of DPLSQTV. Since completion by the addition of edges and completion by the deletion of edges are special cases of completion by the addition and deletion of edges, we focus on completion by the addition and deletion of edges.
First, we analyze the time complexity required per least squares fitting. It is known that least squares fitting for a linear system can be done in time where is the number of data points and is the number of parameters. In our proposed method, we assume that the maximum indegree is bounded by a constant, and the numbers of addition and deletion edges in a given network are bounded by the constants and , respectively. In this case, the time complexity for least squares fitting can be estimated as .
Next, we analyze the time complexity required for computing . The total time required to compute is [20], where we assume that and are . Therefore, the time complexity for s is , because and are .
Next, we analyze the time complexity required for computing s. In this computation, we note that the size of table is . Furthermore, in order to compute the minimum value for each entry in the DP procedure, we need to examine combinations, which is . Hence, the time complexity for s is .
Finally, we analyze the time complexity required for computing s. We note that the size of table is , where we assume that is a constant. Since the number of combinations for computing the minimum value using DP is per entry, the computation time required for computing s is . Hence, the total time complexity is
It is to be noted that if we use time series datasets, each of which consists of points, the time complexity becomes . Although this complexity is not small, it is allowable in practice if and and are not too large. Indeed, as shown in Section 4.2, DPLSQTV works for the completion and inference of timevarying networks with a few tens of genes if .
3. Heuristic Method
Although our previous algorithm, DPLSQTV, is guaranteed to find an optimal solution in polynomial time, the degree of the polynomial is not low, preventing the method from being applied to the completion of largescale networks. Therefore, we propose a heuristic algorithm, DPLSQHS, to significantly improve the computational efficiency by relaxing the optimality condition. The reason why DPLSQTV requires a large amount of CPU time is that the least squares errors are calculated for each node by considering all possible combinations of incoming nodes and taking the minimum value of these. In order to significantly improve the computational efficiency, we introduce an upper limit on the number of combinations of incoming nodes. Although DPLSQHS does not guarantee an optimal solution, it allows us to speed up the calculation of the minimum least squares in the case of adding edges. A schematic illustration of least squares computation is given in Section 3.1. The DPLSQHS algorithm is described in Section 3.2, and we analyze the time complexity of DPLSQHS in Section 3.3.
3.1. Schematic Illustrations of DPLSQHS
Although DPLSQHS can be applied to the addition and deletion of edges, we consider only additions of edges as modification operations in this subsection. We have developed DPLSQHS, which contributes to reducing the time complexity, by imposing restrictions on the number of combinations of incoming nodes to each node. In Figure 3, the diagram indicates that, for each node , we maintain combinations of incoming nodes with lowest errors at the th step. Let denote the set of combinations computed at the th step. At the th step, for each combination where , we calculate the least squares error for each such that is the th incoming node to . The calculated least squares errors are sorted in descending order, the top values are selected, and the corresponding combinations are stored in .
3.2. Algorithm
The following is the description of the algorithm to compute in DPLSQHS, where does not necessarily mean the minimum value and the meaning of “step” is different from that in Section 3.1.
Step 1. For each period , repeat Steps 2–6.
Step 2. Let for all .
Step 4. Repeat Steps 5–7 for node from to .
Step 5. For each combination and each node such that ( if ), calculate the least squares error for the edge set .
Step 6. Sort the obtained least squares errors in descending order and select the top combinations, which are stored in .
Step 7. Let be the minimum least squares error among these top combinations.
The other parts of the algorithm are the same as in DPLSQTV.
3.3. Time Complexity Analysis
In this subsection, we analyze the time complexity of DPLSQHS. Since DPLSQHS can be applied to additions and deletions of edges, we consider the time complexity of completion for adding and deleting edges.
In our proposed method, we assume that the numbers of adding and deleting edges in a given network are, respectively, bounded by constants and . In this case, the time complexity for least squares fitting can be estimated as .
As for the time complexity of computing , we assume that the addition of edges is operated only in the case of adding edges to the nodes with respect to the top of the sorted list. Therefore, the number of combinations of addition of edges, which is bounded by a constant , is . It is well known that the sorting of data can be done in time. Based on such an assumption, the total time required for the computation of is [20], since the factor can be regarded as a constant. Therefore, the time complexity for is , because and are .
Furthermore, for the time complexity required for computing s and s, the calculation process is the same as that in DPLSQTV. Therefore, the computation time for both s and s are as described in Section 2.4. Hence, the total time complexity of DPLSQHS is
If we use time series datasets, each of which consists of points, the time complexity becomes . DPLSQHS requires less time complexity than DPLSQTV, because is much smaller than . Indeed, as shown in Section 4.2, DPLSQHS is much faster than DPLSQTV in practice.
4. Results
We performed computational experiments using both artificial data and real data. All experiments were performed on a PC with an Intel Core(TM)2 Quad CPU (3.0 GHz). We employed the liblsq library (http://www2.nict.go.jp/aeri/sts/stmg/K5/VSSP/install_lsq.html) for the least squares fitting method.
4.1. Completion Using Artificial Data
In order to evaluate the potential effectiveness of DPLSQTV and DPLSQHS, we begin with network completion for timevarying networks using artificial data. We demonstrate that our proposed methods can determine change time points quite accurately when the network structure changes. We employed the structure of the real biological network WNT5A (Figure 4) [26] as the initial network and those of three different networks , , and generated by randomly adding and deleting edges from the initial network. In this method, for each node with input nodes, we considered the following model: where s and s are constants selected uniformly at random from and , respectively. The reason why the domain of s is smaller than that for s is that nonlinear terms are not considered as strong as linear terms. It should also be noted that is a stochastic term, where is a constant (we used ) and is random noise taken uniformly at random from . For the artificial generation of the observed data , we used where is a constant denoting the level of observation errors and is random noise taken uniformly at random from .
As for the time series data, we generated an original dataset with 30 time points including two change points , , which is generated by merging three datasets for , , and . Since the use of time series data beginning from only one set of initial values easily resulted in numerical calculation errors, we generated additional time series data beginning from 200 sets of initial values that were obtained by slightly perturbing the original data. Under the above model, we conducted computational experiments by DPLSQTV in which the initial network was modified by randomly adding edges and deleting edges per node, resulting in , , and ; additionally, we also conducted DPLSQHS experiments in which the initial network was modified by randomly adding edges per node, using the default values of . We evaluated the performance of this method by measuring the accuracy of modified edges, the time point errors for time intervals, and the computational time for completion (CPU time). Furthermore, in order to examine how CPU time changes as the size of the network grows, we generated networks with 20 genes, 30 genes, and 40 genes by making 2, 3, and 4 copies of the original networks. We took the average time point errors, accuracies, and CPU time over 10 random modifications with several s. In addition, we performed computational experiments on DPLSQTV and DPLSQHS using 60 genes, where additional time series data beginning from 100 sets (in place of 200 sets) of initial values were used, and , , and were obtained by addition and deletion of edges. However, DPLSQTV took too long time (more than 1000 sec. per execution) and thus the result could not be included in Table 1.
(a) Using DPLSQTV  
 
(b) Using DPLSQHS  

The accuracy is defined as follows: where and are, respectively, the sets of edges in the original network and the completed network in each time interval. This value is 1 if all the added and deleted edges are correct and 0 if none of the added and deleted edges are correct. If we regard a correctly (resp., incorrectly) added or deleted edge as a true (resp., false) positive, corresponds to the number of false positives and corresponds to the number of true positives. The time point error is the average difference between the original and estimated values for change time points and is defined as where are the estimated change points. As for the computation time, we show the average CPU time.
The results of the two methods are shown in Table 1. It can be seen from this table that the change time point errors are quite small regardless of the size of the network with a low level of observation errors. In addition, it is also seen that the time point errors with DPLSQTV are close to those with DPLSQHS with the exception of high levels of observation errors. We observe that CPU time using DPLSQTV increases rapidly as the size of the network grows. On the other hand, CPU time by DPLSQHS increases gradually as the size of the network grows. It is also observed that the DPLSQHS algorithm is about 4 times faster than the DPLSQTV algorithm in case of 40 genes, while maintaining good accuracy. Hence, these results suggest that DPLSQTV and DPLSQHS can correctly identify the change time points if the error levels are not very large and that it can complete the initial network by modifying the edges with relatively good accuracy if the observation error is small.
It is also observed that DPLSQHS worked reasonably fast even for , although DPLSQTV took more than 1000 seconds per execution and thus the result could not be included in Table 1. However, the accuracy on DPLSQHS became around 0.4 even if the observation error level was low (i.e., ). Therefore, the applicability of DPLSQHS is also limited in terms of the accuracy, although it may still be useful for networks with if the purpose is to identify change time points.
Since DPLSQHS is a heuristic method, the results may be greatly influenced by data. Therefore, we evaluated the stability of DPLSQHS by comparing the variance of the accuracy with that for DPLSQTV, where . The variances for DPLSQTV were 0.00602 and 0.00446 for and , respectively. The variances for DPLSQHS were 0.01188 and 0.00732 for and , respectively. This result suggests that DPLSQHS is less stable than DPLSQTV. However, the variances of DPLSQHS were less than twice those of DPLSQTV. Therefore, this result also suggests that DPLSQHS has some stability.
In order to examine the effect of the number of change points and the maximum number of added and deleted edges per nodes and on the least squares error, we performed computational experiments with varying these parameters (one experiment per parameter). Then, the resulting least squares errors (i.e., s) for DPLSQTV are 5.495, 7.016, 7.875, 3.886, and 3.799 for , , , , and , respectively. It is seen that use of larger , resulted in smaller least squares errors. It is reasonable that more parameters resulted in better least squares fitting. However, use of larger did not result in smaller least squares errors. It may be because addition of unnecessary change points increases the error if an enough number of edges are not added. It is to be noted that although the least squares errors are reduced, use of larger , is not always appropriate because it needs much longer CPU time and may cause overfitting.
We also compared our results with those obtained by the ARTIVA algorithm [15]. It is to be noted that most of the other tools for the inference of timevarying networks are unavailable. This model is based on a combination of DBN and RJMCMC sampling, where RJMCMC is used for approximating the posterior distribution and DBN is used for inferring simultaneously the change points and resulting network structures. We applied ARTIVA to the synthetic datasets that were generated in the same way as for our proposed methods. We used the default parameter settings for ARTIVA and evaluated the results by inferring the change points. As the result of the comparative experiment, there are two change time points in the synthetic datasets, but ARTIVA can only infer one change point regardless of the observation error level, as shown in Figure 5, where ARTIVA does not uniquely determine change points but output probabilities of change points.
(a)
(b)
(c)
(d)
4.2. Inference Using Real Data
We examined two types of proposed methods for the inference of change time points using gene expression microarray data and also compared our results with those obtained using the ARTIVA algorithm. We applied our methods to two real gene expression datasets, measured during the life cycle of D. melanogaster and the cell cycle of S. cerevisiae.
The first microarray dataset is the gene expression time series collected by Spellman et al. [28]. We employed part of the cell cycle network of S. cerevisiae extracted from the KEGG database [29] shown in Figure 6. As for time series data, we combined and employed four sets of time series data (alpha, cdc15, cdc28, and elu) in [28] that were obtained in four different experiments. We adopted the datasets of 10 genes with 71 time points including three change time points. Since there were several expression values that were far from the average in the cdc15 dataset, these values were discarded. As a result, the alpha, cdc15, cdc28, and elu datasets consist of 18, 23, 17, and 13 time points of gene expression data, respectively.
The second microarray dataset is the gene expression time series from experiments by Arbeitman et al. [30]. This data set includes the expression levels of 4028 genes with 67 time points spanning four distinct stages: embryonic (31 time points), larval (10 time points), pupal (18 time points), and adulthood (8 time points) in the D. melanogaster life cycle. We used the expression datasets of 30 genes selected from this microarray data with 67 time points, which include three change time points.
In this computational analysis, with regard to applying the two different types of microarray datasets, we generated 200 datasets that were obtained by slightly perturbing the original data in order to avoid numerical calculation errors. Since the correct timevarying networks are not known, we only evaluated the time point errors and the average CPU time, where and were used with the S. cerevisiae dataset and and were used with the D. melanogaster dataset.
The results are shown in Tables 2 and 3. s are the values of the change point in the original data and s are the estimated values. In the experimental analysis with S. cerevisiae data, as for the change time points, there seems to be almost no difference between the results of DPLSQTV and DPLSQHS, which can correctly identify the time points where the network topology changes. It is also observed from Table 2 that the CPU time required for DPLSQHS is about 15 times faster than that needed for DPLSQTV. In the experiments using data from D. melanogaster, it is seen from Table 3 that both methods can determine exactly the same three change points. At first glance, readers may think that the errors are large at all change point positions. However, both methods could precisely identify two time points when topology of the network changes, excluding the case of . From the point of view of computational time, DPLSQHS performs significantly better than DPLSQTV; DPLSQHS runs about 46 times faster than DPLSQTV. Therefore, DPLSQHS allows us to significantly decrease the computational time. These results suggest that, in many cases, we can expect DPLSQHS to find a nearoptimal solution, at least for change time points, while also speeding up the calculation.


Furthermore, for the ARTIVA analysis, we employed both the abovementioned S. cerevisiae and D. melanogaster microarray datasets, which consist of 71 measurements of 10 genes and 67 measurements of 30 genes, respectively, and tried to identify the change time points. Computational experiments on ARTIVA were performed under the same computational environment as that used in our methods.
The results from the yeast microarray data are shown in Table 2. There are three change time points, as described in this table. It is seen from this table that two of them, 24 and 60, can be determined precisely by ARTIVA, but the third is not. In contrast, our proposed methods demonstrate good performance for inferring the change points at which the network topology changes. Lèbre et al. [15] demonstrated the number of identified change points with D. melanogaster data using the ARTIVA algorithm. According to this validation, it has been observed that the time intervals 1819, 31–33, 41–43, and 59–61 contain more than 40% of all change points. In order to compare with the ARTIVA results, we attempted to identify four change points using our proposed methods. The results of the comparative experiment using D. melanogaster microarray data are shown in Table 4. s are three change time points in original data. Although DPLSQHS identified change time points similar to those identified by ARTIVA, the results of ARTIVA appear to be slightly better. This suggests that the ARTIVA algorithm shows slightly better performance with respect to the inference of change points than our proposed methods. However, ARTIVA does not determine change time positions but determines time intervals at which the network topology might change. Therefore, DPLSQHS is more suited for identifying change time positions at alltime points. (Since the comparative experiment by DPLSQTV did not finish within 3 weeks, the results of DPLSQTV are not given in Table 4.)

5. Conclusion
In this paper, we have proposed two novel network completion methods for timevarying networks by extending our previous method, DPLSQ [20]. In order to identify the change time points and sets of modified edges in network completion, we developed two different types of double DP algorithms. The first algorithm, DPLSQTV, is intended to complete and precisely infer timevarying networks. Although DPLSQTV allows us to guarantee the optimality of its solution, it requires a large amount of computational time as the size of the network grows.
To improve the computational efficiency of DPLSQTV, we developed an effective heuristic method, DPLSQHS, by speeding up the calculation of the minimum least squares error by posing restrictions to the number of combinations of incoming nodes. We showed that each of these two methods works in polynomial time if the maximum indegree is bounded by a constant.
The results of computational experiments reveal that the two proposed methods can identify change time points rather accurately and can infer edges to be deleted and added with good accuracy. DPLSQTV provided a wide range of applications, not only in network completion but also in network inference, with good accuracy. Additionally, DPLSQHS allowed us to identify change time points rather precisely, while reducing the computational time for both synthetic data and microarray data. This result suggests that, in many cases, DPLSQHS can be expected to find nearoptimal solutions, while speeding up the calculation.
Although DPLSQHS is much faster than DPLSQTV, it has a drawback: the accuracy and time point error were worse than those by DPLSQTV, especially, when the observation error level was large. Therefore, we need to improve the accuracy of DPLSQHS without significantly undermining its efficiency. In our experiments, we specified the number of change time points and the number of edges to be added and deleted. In real use, we may examine several values and select the best one (e.g., the values with the minimum least squares errors). However, as discussed in Section 4.1, it may lead to overfitting. In order to avoid overfitting, use of AIC (Akaike's Information Criterion) or other information criteria is useful as demonstrated in [27] for network inference. However, since network completion is more complex than network inference, the method in [27] cannot be directly applied. Therefore, incorporation of an appropriate information criterion into network completion is important future work. Another issue to be tackled is to take into account the relationship between and . Although and are inferred independently from the original network by the proposed method, there should be some strong relationship between them. Therefore, such an extension is also important future work.
Conflict of Interests
The authors declare that they have no conflict of interests.
Acknowledgments
The authors would like to thank Professor Hideo Matsuda in Osaka University and Takanori Hasegawa in Kyoto University for helpful discussions. This work was partially supported by JSPS, Japan, (GrantsinAid 22240009 and 22650045).
References
 K.H. Cho, S.M. Choo, S. H. Jung, J.R. Kim, H.S. Choi, and J. Kim, “Reverse engineering of gene regulatory networks,” IET Systems Biology, vol. 1, no. 3, pp. 149–163, 2007. View at: Publisher Site  Google Scholar
 H. Hache, H. Lehrach, and R. Herwig, “Reverse engineering of gene regulatory networks: a comparative study,” Eurasip Journal on Bioinformatics and Systems Biology, vol. 2009, Article ID 617281, 2009. View at: Publisher Site  Google Scholar
 M. Hecker, S. Lambecka, S. Toepferb, E. van Somerenc, and R. Guthkea, “Gene regulatory network inference: data integration in dynamic models: a review,” BioSystems, vol. 96, pp. 86–103, 2009. View at: Google Scholar
 S. Liang, S. Fuhrman, and R. Somogyi, “Reveal, a general reverse engineering algorithm for inference of genetic network architectures,” Proceedings of the Pacific Symposium on Biocomputing, vol. 3, pp. 18–29, 1998. View at: Google Scholar
 T. Akutsu, S. Miyano, and S. Kuhara, “Inferring qualitative relations in genetic networks and metabolic pathways,” Bioinformatics, vol. 16, no. 8, pp. 727–734, 2000. View at: Google Scholar
 N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” Journal of Computational Biology, vol. 7, no. 34, pp. 601–620, 2000. View at: Publisher Site  Google Scholar
 S. Imoto, S. Kim, T. Goto et al., “Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network,” Journal of Bioinformatics and Computational Biology, vol. 1, no. 2, pp. 231–252, 2003. View at: Google Scholar
 T. Thorne and M. P. H. Stumpf, “Inference of temporally varying Bayesian networks,” Bioinformatics, vol. 28, pp. 3298–3305, 2012. View at: Google Scholar
 P. D'Haeseleer, S. Liang, and R. Somogyi, “Genetic network inference: from coexpression clustering to reverse engineering,” Bioinformatics, vol. 16, no. 8, pp. 707–726, 2000. View at: Google Scholar
 Y. Wang, T. Joshi, X. Zhang, D. Xu, and L. Chen, “Inferring gene regulatory networks from multiple microarray datasets,” Bioinformatics, vol. 22, no. 19, pp. 2413–2420, 2006. View at: Publisher Site  Google Scholar
 H. Toh and K. Horimoto, “Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling,” Bioinformatics, vol. 18, no. 2, pp. 287–297, 2002. View at: Google Scholar
 R. Yoshida, S. Imoto, and T. Higuchi, “Estimating timedependent gene networks from time series microarray data by dynamic linear models with Markov switching,” in Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB '05), pp. 289–298, August 2005. View at: Publisher Site  Google Scholar
 A. Fujita, J. R. Sato, H. M. GarayMalpartida, P. A. Morettin, M. C. Sogayar, and C. E. Ferreira, “Timevarying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method,” Bioinformatics, vol. 23, no. 13, pp. 1623–1630, 2007. View at: Publisher Site  Google Scholar
 J. W. Robinson and A. J. Hartemink, “Nonstationary dynamic Bayesian networks,” in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS '08), pp. 1369–1376, December 2008. View at: Google Scholar
 S. Lèbre, J. Becq, F. Devaux, M. P. H. Stumpf, and G. Lelandais, “Statistical inference of the timevarying structure of generegulation networks,” BMC Systems Biology, vol. 4, p. 130, 2010. View at: Publisher Site  Google Scholar
 Y. W. Teh and M. I. Jordan, “Hierarchical Bayesian nonparametric models with applications,” in Bayesian Nonparametrics, pp. 158–207, Cambridge University Press, Cambridge, UK, 2010. View at: Google Scholar
 G. Rassol and N. Bouaynaya, “Inference of timevarying gene networks using constrained and smoothed Kalman filtering,” in Proceedings of the International Workshop on Genomic Signal Processing and Statistics, pp. 172–175, 2012. View at: Google Scholar
 A. Ahmed, L. Song, and E. P. Xing, “Timevarying networks: recovering temporally rewiring genetic networks during the lofe cycle of drosophila,” SCS Technical Report Collection CMUML08118, 2008. View at: Google Scholar
 T. Akutsu, T. Tamura, and K. Horimoto, “Completing networks using observed data,” in Proceedings of the 20th International Conference on Algorithmic Learning Theory, pp. 126–140, 2009. View at: Google Scholar
 N. Nakajima, T. Tamura, Y. Yamanishi, K. Hiromoto, and T. Akutsu, “Network completion using dynamic programming and leastsquares fitting,” The Scientific World Journal, vol. 2012, Article ID 957620, 8 pages, 2012. View at: Publisher Site  Google Scholar
 A. Clauset, C. Moore, and M. E. J. Newman, “Hierarchical structure and the prediction of missing links in networks,” Nature, vol. 453, no. 7191, pp. 98–101, 2008. View at: Publisher Site  Google Scholar
 R. Guimerà and M. SalesPardo, “Missing and spurious interactions and the reconstruction of complex networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 52, pp. 22073–22078, 2009. View at: Publisher Site  Google Scholar
 M. Kim and J. Leskovec, “The network completion problem: inferring missing nodes and edges in networks,” in Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 47–58, 2011. View at: Google Scholar
 S. Hanneke and E. P. Xing, “Network completion and survey sampling,” Journal of Machine Learning Research, vol. 5, pp. 209–215, 2009. View at: Google Scholar
 N. Nakajima and T. Akutsu, “Network completion for timevarying genetic networks,” in Proceedings of the 7th International Conference on Complex, Intelligent, and Software Intensive Systems, vol. 2013, pp. 553–558, 2013. View at: Google Scholar
 S. Kim, H. Li, E. R. Dougherty et al., “Can Markov chain models mimic biological regulation?” Journal of Biological Systems, vol. 10, no. 4, pp. 337–357, 2002. View at: Publisher Site  Google Scholar
 N. Noman, L. Palafox, and H. Iba, “On model selection criteria in reverse engineering gene networks using RNN model,” in Proceedings of International Conference on Convergence and Hybrid Information Technology, pp. 155–164, 2012. View at: Google Scholar
 P. T. Spellman, G. Sherlock, M. Q. Zhang et al., “Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273–3297, 1998. View at: Google Scholar
 M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa, “KEGG for representation and analysis of molecular networks involving diseases and drugs,” Nucleic Acids Research, vol. 38, no. 1, Article ID gkp896, pp. D355–D360, 2009. View at: Publisher Site  Google Scholar
 M. N. Arbeitman, E. E. M. Furlong, F. Imam et al., “Gene expression during the life cycle of Drosophila melanogaster,” Science, vol. 297, no. 5590, pp. 2270–2275, 2002. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Natsu Nakajima and Tatsuya Akutsu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.