Computational and Mathematical Methods in Medicine

Volume 2017 (2017), Article ID 3020326, 9 pages

https://doi.org/10.1155/2017/3020326

## Inference of Biochemical S-Systems via Mixed-Variable Multiobjective Evolutionary Optimization

^{1}School of Science, Wuhan University of Technology, Wuhan, Hubei 430070, China^{2}School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China

Correspondence should be addressed to Yu Chen

Received 30 November 2016; Accepted 27 April 2017; Published 21 May 2017

Academic Editor: Giancarlo Ferrigno

Copyright © 2017 Yu Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Inference of the biochemical systems (BSs) via experimental data is important for understanding how biochemical components in vivo interact with each other. However, it is not a trivial task because BSs usually function with complex and nonlinear dynamics. As a popular ordinary equation (ODE) model, the S-System describes the dynamical properties of BSs by incorporating the power rule of biochemical reactions but behaves as a challenge because it has a lot of parameters to be confirmed. This work is dedicated to proposing a general method for inference of S-Systems by experimental data, using a biobjective optimization (BOO) model and a specially mixed-variable multiobjective evolutionary algorithm (mv-MOEA). Regarding that BSs are sparse in common sense, we introduce binary variables indicating network connections to eliminate the difficulty of threshold presetting and take data fitting error and the -norm as two objectives to be minimized in the BOO model. Then, a selection procedure that automatically runs tradeoff between two objectives is employed to choose final inference results from the obtained nondominated solutions of the mv-MOEA. Inference results of the investigated networks demonstrate that our method can identify their dynamical properties well, although the automatic selection procedure sometimes ignores some weak connections in BSs.

#### 1. Introduction

Biochemical systems (BSs) consist of many components, which interact with each other in a complex way to act as integrated dynamic systems. Inference of biochemical systems is dedicated to identify how these components interact with each other and helpful to investigate the dynamical properties of these complex systems. In the past decades, the (probabilistic) Boolean networks [1, 2], the (dynamic) Bayesian networks [3, 4], and some other probabilistic or statistical methods [5, 6] have been proposed to confirm connections between components in BSs. Although the probability- or statistics-based models can properly address the negative effects of data noise, they cannot precisely incorporate the dynamical properties of BSs. Because ordinary differential equation (ODE) models that produce directed signed graphs are not only suited for steady-state and time-series profiles but also able to work entirely in classical category [7], they are widely utilized to model various kinds of BSs [8, 9].

In biochemical system theory (BST), the S-System model incorporating the pow-law formalism is considered as an effective and consistent mathematical model to represent and analyze the biological systems [10]. Its mathematical formalism is a nonlinear ODE system:where represents the concentration of reaction component and is the total number of components in the investigated network. In the S-System, there are totally parameters, including the positive rate constants and the kinetic constants . Although the number of parameters to be determined is relatively large for inference of S-System, it is also employed to reconstruct large-scale GRNs [11–13], attributed to its powerful approximation of dynamics of biochemical reactions.

Inference of S-Systems is available when there is a time-course experimental data set of all components, implemented by minimizing the differences between experimental data and numerical results. To address the ill-posedness of this reverse problem, minimization of the differences is usually normalized and penalized as [14, 15]where is the numerical results of at time and is the penalization parameter that is problem-dependent. To make the objective function continuous, is commonly taken as

Nonlinear model (2) of S-Systems has a complicated landscape, which results in the preference to solve it by evolutionary algorithms (EAs) [16, 17]. When evaluating the candidate network parameters in the population of EAs, the ODE system should be solved via some numerical method such as the Runge-Kutta method, which could lead to a computationally-heavy evaluation process. Thus, Tsai and Wang [18] used an allocation method to decouple the ODE system. However, this method introduces an allocation parameter for evaluation of candidate solutions, which is hard to debug for its dependence on the investigated problems and available data sets. Liu et al. [19] developed a separable parameter estimation method (SPEM) to decouple the S-System. But, in their method, the rate constants are numerically determined by the least square method, which could be computationally difficult because it has to compute inverse of a matrix that is the product of a matrix and its transpose (the computational difficulty concerns not only the time complexity but also the stability of algorithm for computation of inverse matrices).

Since inference of BSs simultaneously addresses several issues, multiobjective optimization models could be an available alternative for this problem. Liu and Wang [20] proposed a three-objective optimization model simultaneously minimizing the concentration error, slope error, and interaction error and then transformed it to a single-objective optimization problem by converting two objectives into constraints. However, transformation of the multiobjective model to the single-objective model greatly depends on a prior information on network connections of the investigated network. Koduru et al. [21] and Cai et al. [22] simultaneously minimized data error for several different data sets, but they did not try to minimize the network connections/parameter norms to get sparse networks, which makes it more difficult to set a threshold for pruning net connections. To address the defect of model (2) that has to be regulated to ensure that its global optimal solutions do lie around the true network parameters, Spieth et al. [23] took the data error and connection number as two minimization objectives and solve it using a multiobjective evolutionary algorithm. However, they did not work on how to choose an appropriate Pareto solution as the final inference result.

This work is dedicated to address the aforementioned shortcomings in existing works. To eliminate the difficulties of debugging regularization parameters in regularized methods, we construct a biobjective optimization (BOO) model that tries to simulate the dynamical properties of S-Systems by minimizing the error between computed derivatives and estimated slopes, simultaneously driving the network topology as sparse as possible. Meanwhile, fitting of derivatives also makes it possible for decoupling S-Systems without incorporation of extra parameters. For solution of the proposed BOO model, we propose a mix-variable multiobjective evolutionary algorithm (mv-MOEA) in which a candidate network configuration is represented with combination of binary variables indicating network connections and real variables of parameter values. Then, an automatic selection procedure (ASP) is employed to take the final inference results as one from the obtained nondominated solutions of BOO. Because the ASP runs tradeoff between fitting errors and network connections by locating the knee regions on the curves of normalized objective values, it can obtain a preferred sparse network configuration with the absence of a prior information on network connections.

The rest of this paper is organized as follows. Section 2 introduces the inference method proposed in this work. Then, effectiveness of our method is validated by benchmark S-Systems in Section 3. Finally, Section 4 draws the conclusions and presents the future work.

#### 2. Method

The inference method based on multiobjective evolutionary optimization (IM-MOEO) consists of three parts: the biobjective optimization (BOO) model, the mixed-variable multiobjective evolutionary algorithm (mv-MOEA), and the automatic selection procedure (ASP), which are, respectively, presented in the following.

##### 2.1. The Biobjective Optimization Model

###### 2.1.1. Decoupling the S-System

To decrease number of parameters to be confirmed, the S-System is firstly decoupled before we try to infer it by experimental data. At the same time, decoupling the S-System also reduces the time complexity of evaluation process. Dynamical properties of the autonomous S-System can be fitted by approximating the derivative values of component concentration at each time point: for the th equation in (1) we try to minimize the difference between two sides of each equation aswhere represents the vector of at all time points, approximated by the five-point numerical formula. , the slope vector corresponding to parameter vector , is computed via the right part of the th equation for all time points [19].

###### 2.1.2. Representation of an S-System

By decoupling the S-System (1), we can infer equations one by one. The th equation is characterized as network connections and parameter values and represented by , where and are binary and real vectors, respectively. Then, the kinetic constants can be confirmed by . Moreover, we can also get values of rate constants by and .

###### 2.1.3. The Biobjective Optimization Model

Once the S-System is decoupled, we can infer ODEs one by one. The biobjective optimization model for inference of the th ODE iswhere is the parameters to be confirmed, is the norm of . Minimization of is dedicated to obtain a sparse network topology. Thanks to introduction of binary variables, the -norm can be computed as So, no threshold is needed to prune the network connections.

##### 2.2. The Mixed-Variable Multiobjective Evolutionary Algorithm

Because there are at most connections for each component in (1), the BOO model (6) has at most Pareto solutions (for an -order S-System, values of are restricted in ). So, no diversity keeping strategy is necessary to obtain a set of uniformly distributed efficient vectors if the population size is set greater than . Meanwhile, the BOO model includes mixed variables and mixed-objectives, which make it difficult to solve it. Thus, we propose a mixed-variable multiobjective evolutionary algorithm (mv-MOEA) for this model. The mv-MOEA employs a respective evolution strategy for discrete and binary variables, which is beneficial to both the global exploration in the mix-variable search region and the local exploitation in the real variable subregion.

When inferring an equation of an S-System of components, mv-MOEA employs a population* Pop* with assistance of* OldPop* to accelerate convergence of binary search [24]. Both* Pop* and* OldPop* are of size* PopSize* and separated into the discrete section and real section as Considering that a solution is evaluated by combining the binary and real variables, we also use an archive to save promising network topologies, as well as a of real vectors for the th solution in to promote the search in feasible regions of real variables. The framework of mv-MOEA is described as follows.

*Step 1 (initialization). *Randomly generate , , and of* PopSize* individuals and evaluate them. , initialize to be a set of* PopSize* randomly generated -dimensional real vectors. By combining with the th binary individual in , evaluate all vectors in via (6). Randomly select an individual in to be .

*Step 2 (recombination). *Generate* PopSize* offspring and combine them with to construct the intermediate population .

*Step 3 (sorting). *Sort via their fitting errors (the fitting error refers to the first objective of model (6)) and denote the worst one as . Compute dominance ranks of individuals in via the Pareto dominance relation. For individuals of the same rank, sort them in ascending order via their -norm (the second objective of (6)).

*Step 4 (updating). *Set and update by . Replace with* PopSize* best individuals in and update by the rest of . is randomly selected from or .

*Step 5. *If the stopping criterion is not satisfied, go to Step 2; otherwise, output the nondominated solutions and the iteration process ceases.

###### 2.2.1. Recombination

For , is generated by three randomly selected parents , , and :(i)With a probability , and are randomly selected from , and is randomly selected from .(ii)Otherwise, , , and are randomly selected from .Then, the binary and real part are, respectively, generated as follows. (i)The bit-string is generated by the bit-strings of , , and according to the binary recombination strategy proposed in [24].(ii)The real vector is generated by the DE/rand/1 mutation and the binary crossover strategies [25]. With a probability , three parents are the real parts of , , and ; otherwise, is generated by the real part of and two real vectors randomly selected from .Finally, we combine and to obtain the candidate solutions .

###### 2.2.2. Updating

If the fitting error of is smaller than that corresponding to in , replace with . is selected from the population with probability ; otherwise, it is selected from . Since the archive is here adopted to guide the convergence, it is updated with respect to the hamming distances. An offspring is employed to update as follows.(i)If the hamming distance between and is greater than zero for any , update the archive member by if has a better fitting error. Here, is the archive member with the worst fitting error when the hamming distance between any two archive members is greater than zero; otherwise, it is the worst one of archive members that have repeated bit-string;(ii)otherwise, compare with archive members with repeated bit-strings and replace by it one with the worst fitting error.

In this work, , , and are set to be 0.8, 0.7, and 0.8, respectively.

##### 2.3. Evaluation of the Obtained Nondominated Solutions

The mx-MOEA generates a set of nondominated solutions of the BOO model (6) for each equation in the S-System, where each nondominated solution represents a configuration of this equation. Because two objectives of the nondominated solutions conflict with each other, all of them could be a candidate configuration of the investigated equation. Thus, selecting one from nondominated solutions needs to address a tradeoff between network connections and fitting errors. Considering that two objectives could be of different orders of magnitude, we first normalize two objectives by their maximum values. Then, two scores and can be obtained for the th solution of equation . Now, a tradeoff strategy could be employed to select one solution as the final inference result of equation .

Assume that nondominated solutions are obtained for the th equation. A popular aggregation method is to compute the linear aggregation sum (LAS): for a given weight and select one solution with minimum sum as the inference result of equation . However, the LAS method, which is dedicated to compare the difference of sum between all configurations, focuses on the most acute decrease of sum as connection number increases. As a result, a “too sparse” network is obtained.

To address the merit of LAS method, we would like to take the aggregation product (AP)as the criterion of result confirmation. For a given weight , we select the th solution withas the inference result of equation . Combining inference results of all equations, we then get the overall inference result of the investigated S-System and evaluate its quality by numbers of the true positives (TPs), false positives (FPs), true negatives (TNs), and false positives (FPs). With change of parameter , the Receiver Operator Characteristic (ROC) curve is taken as the illustration of quality for the nondominated solutions of (6) obtained by the mv-MOEA.

##### 2.4. The Automatic Procedure for Selecting the Final Inference Results from the Obtained Nondominated Solutions

Although we can get an inference result for a given , we do not know which value of corresponds to the “right” tradeoff between two objectives of (6). In this work, we would like to propose an automatic selection procedure (ASP) for confirmation of the final inference result. When an inference result of the investigated S-System is obtained for a given weight , we can simultaneously get a sum vector as where and are the respective normalized objective values of the inference result for equation . When sampling in , we also get a collection of sum vectors constituting a normalized Pareto front in the objective space.

Since there are only two objectives to be considered, improvement of one objective will lead to deterioration of another. If small improvement of one objective results in a severe deterioration of another, the solutions constitute the so-called knee regions. According to the method proposed in [26], Li et al. suggest to locate the knee region to select one result from the Pareto front, where the angle-based method [27] is employed to seek the knee points on the combined Pareto front. In this method, two adjacent points are incorporated to compute the tradeoff angle of a point, and one with maximum tradeoff angle is selected as the preferred sum vector. Recall the value of corresponding to this sum vector; we can get the final inference results of all equations via (10) and (11) and combining them to get the overall inference result of the investigated S-System.

#### 3. Results and Discussions

In this section, performances of the IM-MOEO are validated by two investigated S-Systems. To demonstrate the inference precision and the robustness of our method, we first infer an artificial network via noise-free and noisy simulated data. Then, the Ethanol Production System by Yeast is investigated to show how our method obtains a sparse network by focusing on simulation of its dynamical property. When generating simulated data for two problems, we sample 15 time points uniformly in the same time intervals as those investigated in the literatures for comparison.

##### 3.1. Inference of an Artificial System by Noise-Free and Noisy Data

The investigated artificial network S1 is an artificial S-System illustrated in Table 1. Some previous researches [15, 20, 28–32] have been performed on this network to demonstrate efficiencies of S-System inference methods. Thus, we also reconstruct it to validate competitiveness of our method.