Abstract

Synchronous probabilistic Boolean networks (PBNs) and generalized asynchronous PBNs have received significant attention over the past decade as a tool for modeling complex genetic regulatory networks. From a biological perspective, the occurrence of interactions among genes, such as transcription, translation, and degradation, may require a few milliseconds or even up to a few seconds. Such a time delay can be best characterized by generalized asynchronous PBNs. This paper attempts to study an optimal control problem in a generalized asynchronous PBN by employing the theory of average value-at-risk (AVaR) for finite horizon semi-Markov decision processes. Specifically, we first formulate a control model for a generalized asynchronous PBN as an AVaR model for finite horizon semi-Markov decision processes and then solve an optimal control problem for minimizing average value-at-risk criterion over a finite horizon. In order to illustrate the validity of our approach, a numerical example is also displayed.

1. Introduction

Modeling genetic regulatory networks is a core issue in system biology. Numerous models of modeling and understanding genetic regulatory networks have been proposed, which include Boolean networks, Bayesian networks, Petri net, differential equations, computational framework based on the automata and languages theory, and probabilistic Boolean networks (PBNs) [18]. Among all the models, Boolean networks and PBNs have gained popularity in modeling real genetic regulatory networks [1, 911]. They are also very useful to infer genetic regulatory networks because they can monitor the dynamic behavior in complicated systems based on large amounts of gene expression data. Compared with a Boolean network, which is essentially a deterministic model, a stochastic model is more suitable due to the measurement noise in inferring a gene regulatory network. PBNs are then introduced to cope with the shortcoming. According to the updating time for each gene in the networks, the works on PBNs can be roughly classified into two cases: synchronous PBNs and asynchronous PBNs. Synchronism means that all genes or nodes simultaneously update their activities. The time parameters of the state processes are discrete, and the time between two successive transitions is equal and fixed. However, synchronous PBNs cannot characterize the real situations. For example, interactions among genes, such as transcription, translation, and degradation, may require a few milliseconds or even up to a few seconds [12]. Motivated by practical considerations, generalized asynchronous PBNs have emerged. In a generalized asynchronous PBN, all the genes update synchronously and the updating period of each gene is randomly selected based on a given distribution. We next briefly describe some existing works on the two cases, respectively.

From different treatment viewpoints, synchronous PBNs and their corresponding intervention methods have been studied in the literatures. For example, due to the fact that the dynamic behavior of the synchronous PBNs can be characterized by the theories of Markov chains and Markov decision processes, the studies of synchronous PBNs have mainly focused on manipulating external (control) variables that alter the transition probabilities of the network and can be used to help the network avoid undesirable states such as those associated with a disease [911]. Specifically, based on the introduction of control inputs, Datta et al. [10] formulate the corresponding control problem as a minimization problem with some costs. Optimal policies are obtained by using the theory of average expected costs and expected discounted cost criteria in infinite horizon Markov decision processes in [11]. Chen and Chen [13] study the control of gene regulatory networks using a Markovian approach, and in the study of Ng et al. [14], the corresponding control model for a gene intervention problem is approximated to drive undesirable states to desirable ones and formulated as a minimization problem with integer variables and continuous variables. To reduce the detrimental side effects of cancer treatment and treatment of patients suffering, Faryabi et al. [15] restrict the expected number of treatments beneath some bounds by using the approach of constrained Markov decision processes; Ching et al. [16] formulate the control problem as a new minimization problem of the distance related to system behaviors with hard constraints in which they add the upper bound for the number of control inputs and compute using a similar approximate method in [17]; Kobayashi and Hiraishi propose an integer programming-based approach for a context-sentive PBN in [18], and they also reduce the optimal control of PBNs to a polynomial optimization problem in [19]. Ivanov and Dougherty consider two kinds of mappings reduction and projection and their effect on the original probability structure of a given PBN in [20]. A review of existing references above shares one common feature: the assumption of synchronous time and fixed observable epoch.

Based on semi-Markov processes, asynchronism in PBNs has been introduced by updating each node based on its period. Faryabi et al. obtain an optimal control policy for semi-Markov asynchronous regulatory networks that minimizes the time that the system spends in undesirable states based on the theory of semi-Markov decision processes in [21]. Liu et al. solve an optimal control problem by choosing optimal constituent Boolean networks in the asynchronous PBN [22]; Liu also describes a control model in an asynchronous PBN as a first passage model in controlled semi-Markov processes [23].

This paper attempts to study a different optimality problem in the framework of generalized asynchronous PBNs. The main motivations of this paper are as follows. First, a fixed and constant sojourn time is assumed in a synchronous PBN, while sojourn times are allowed to follow an arbitrary probability distribution in a generalized asynchronous PBN. Hence, generalized asynchronous PBNs are better able to describe the dynamic behavior of a gene regulatory network because updating the period of each gene is allowed to be made at random points when certain critical events occur, such as a great change in the process of gene expression. Second, random fluctuations in the abundance of molecules in the living cell may lead to noise, thereby affecting its growth and well-being [2426]. Recent studies suggest that the stochastic noise in gene expression should be measured by the normalized variance. Due to the limitation of the variance measure, it has been replaced by more complicated risk measures like value-at-risk or AVaR [2729]. Hence, it is meaningful to control and reduce harmful noise in cellular processes via an AVaR approach. Third, although regulatory intervention has been treated for synchronous PBNs, and the development of intervention policies are based on associated Markov decision processes in [10, 11, 13], therapeutic intervention policies within the context of asynchronous PBNs are few. Hence, the AVaR problem of semi-Markov decision processes in the framework of asynchronous PBNs is unsolved and worth studying.

In this paper, the corresponding generalized asynchronous PBN is described as an AVaR model in finite horizon semi-Markov decision processes with each state variable denoting a GAP at some time. The optimality problem is solved by minimizing AVaR related to system behavior during a finite horizon. In terms of the theory of the AVaR model in finite horizon semi-Markov decision processes [28], the state space is a set of all possible GAPs. The available action space consists of all possible control inputs at a given GAP, and the objective function of the network to be optimized is defined as AVaR, which represents the expected loss upon being within some percentage of the worst-case loss scenario by using a policy starting from an initial state. We give the main ideas of the solution to the optimality problem, which have been formalized in [28], in the following. More precisely, firstly, using a representation of AVaR, we convert the problem of minimizing the AVaR of the finite horizon cost for a semi-Markov decision process into a bilevel optimization problem in which the outer optimization problem is an ordinary problem of minimizing a function of a single variable and the inner optimization problem is to minimize the expected positive deviation of the finite horizon cost for semi-Markov decision processes. According to the results presented in [28], the outer optimization problem can be settled by solving the inner optimization problem. Secondly, we will propose a dynamic programming-based algorithm to solve the inner optimization problem. Thirdly, based on the solution of the inner optimization problem, the existence and computation of an AVaR optimal policy are established by solving the outer optimality problem. Therefore, our optimality problem is actually described as an AVaR model in finite horizon semi-Markov decision processes, and the optimality technique developed well in semi-Markov decision processes [28, 30] can be used to solve it efficiently. The advantage of the method used in this paper is to help biologists in further developing better therapy strategies in synchronous and asynchronous PBNs [3133].

The structure of this paper is as follows. Section 2 gives a brief introduction of generalized asynchronous PBNs. Section 3 formulates a finite horizon model in semi-Markov decision processes for a generalized asynchronous PBN, and an algorithm for solving the optimization problem is provided in Section 4. A numerical example is shown in Section 5. Finally, a brief summary and comments are put in Section 6.

2. Boolean Networks, Synchronous Probabilistic Boolean Networks, and Generalized Asynchronous Probabilistic Boolean Networks

In this section, we give a brief review on Boolean networks, synchronous probabilistic Boolean networks, and generalized asynchronous probabilistic Boolean networks. For a detailed and complete exposition, the readers are referred to the literatures [4, 31].

A Boolean network consists of a set of nodes (genes) , , , and a collection of Boolean functions , where , . contains one Boolean function for each gene in . The function is the predictor function of gene . Each gene can take on one of two binary values, 0 or 1, corresponding to the case that a particular gene is not expressed or expressed. We denote by the state of gene at time step . Thus, the overall state of all the genes in the network at time step is given by the vector , also known as the gene activity profile (GAP). The rules of the regulatory interactions among the genes are determined by .

If the genes and the corresponding Boolean functions are given, then a Boolean network is defined. It is not difficult to see that a Boolean network is essentially a deterministic model. To overcome the deterministic rigidity of a Boolean network, the uncertainty of intergene relations are imposed on the dynamic process, which creates synchronous PBNs [14]. Specifically, each gene contains several Boolean functions (predictor functions) to be chosen for determining the next state of the corresponding gene in a synchronous PBN, while it has only one Boolean function in a Boolean network. The probability of choosing the Boolean function is , where satisfies the following equation.

We denote by the th possible realization, which is of the form . Each vector-valued function determines a constituent Boolean network in a synchronous PBN. There are at most constituent Boolean networks. The dynamics of a synchronous PBN can be represented via a Markov chain. Let and be the two column vectors. Then the transition probability satisfies the following equation:

Suppose that and take all the possible values in , one can obtain the transition probability matrix.

In a synchronous PBN, all the genes (nodes) simultaneously update when one of the Boolean networks is selected. The time parameters of the state processes are discrete, and the time between two successive transitions is equal and fixed. However, such model is less realistic [26]. From a view of biology, interactions between genes that cause transcription, translation, and degradation may require a few milliseconds or even up to a few seconds [12]. Due to the limitation of the time parameters of the state transitions, the generalized asynchronous PBNs, which are an extension of synchronous PBNs, have gained popularity in modeling genetic regulatory networks [21]. In a generalized asynchronous PBN, all the genes update synchronously and the updating period of each gene is randomly selected based on a given distribution.

To be consistent with aforementioned models, we use the same notation to define generalized asynchronous PBNs. Similar to the synchronous PBN, a generalized asynchronous PBN also contains a set of nodes (genes) , where and each represents the expression value of a gene, and a collection of vector-valued functions . If one of the possible realizations of the generalized asynchronous PBN is chosen, the values of all the genes are simultaneously updated according to the predictors determined by . The gene activity profile is an -tuple giving the values of genes at time , where . Different from synchronous PBNs, the time parameter of the state process is continuous, the time length between two transitions is random, and the random time follows any distribution in a generalized asynchronous PBN. Such superior properties of the generalized asynchronous PBN can best characterize the real gene regulatory networks. Two consecutive epoch times and are considered. When the generalized asynchronous PBN occupies the GAP , a constituent Boolean network is chosen according to some regulatory rule. As a consequence of the constituent Boolean network choice, the network jumps to a new GAP after a sojourn time, whose distribution can be characterized by semi-Markov kernel . The transition probability satisfies , is a random variable, and denotes its distribution function.

3. A Semi-Markov Decision Process Model for Generalized Asynchronous Probabilistic Boolean Networks

In this section, we will formulate a generalized asynchronous PBN as an AVaR model for finite horizon semi-Markov decision processes with specified definitions of notation as follows.

First, we specify the state of the generalized asynchronous PBN, which is the information to the controller. The state of the generalized asynchronous PBN at any time denotes the gene activity profile which is a vector , where represents the expression value of gene and , . To facilitate our discussion, we use the following state representation of the network and define to be the state of the network. Therefore, the corresponding state space is composed of all possible GAPs, that is, .

Second, a vector consisting of control inputs can be regarded as an action available to the controller at state . The use of such control inputs makes sense from a treatment viewpoint. For instance, in the case of diseases like cancer, auxiliary treatment inputs, such as radiation and chemotherapy, may be applied to move the state vectors away from one which is associated with proliferation of cancer cells. In order to accord with the above decimal number, we take (the decimal representation of a control input) as an action. We denote by a set of actions available to the controller when the generalized asynchronous PBN is at state . In other words, represents the set of all possible treatments, which is assumed to be a finite set in this paper.

Third, the generalized asynchronous PBN occupies a state at some decision epoch and the controller chooses a vector of control inputs (an action) according to some decision rules. As a consequence of this action choice, the following thing occurs: the generalized asynchronous PBN jumps to a new state after a sojourn time in , whose distribution is subject to the semi-Markov kernel. To characterize such event, we need to introduce the semi-Markov decision kernel as follows. For every , , , and a given action , based on [10, 28], the semi-Markov decision kernel is of the form where and denote the distribution functions of the sojourn time and the transition probabilities in mathematics, respectively. In fact, from the definition of the generalized asynchronous PBN and the properties of semi-Markov processes, the generalized asynchronous PBN stays in a given state and a random length of time [28] and its distribution function is denoted by . In the previous section, we have shown that the dynamic behavior of generalized asynchronous PBN could be represented as a semi-Markov process where the state transition was completely specified by all of the possible Boolean functions.

Fourth, the cost function is assumed to be , which depends on the current state and a chosen action .

Fifth, policies are needed to specify decision rules at every jump time. A policy is actually a certain operation rule through which the controller chooses an action. The policies under our optimization problem are based on the decision epoches, as well as the states and actions for the controller. Thus, a policy can be expressed by a function acting on with the property that belongs to for all . In other words, the function is an action chosen at the current state and the decision epoch , which is controlled by the decision-maker using policy . Let be the set of all policies.

For requirements, we also introduce some other notation used in this paper. For each and , by the well-known Tulcea theorem in [28, 30, 34], there exists a stochastic process , in which , , and denote the th decision epoch and the state and the action (control input) chosen at the th decision epoch, respectively.

Under suitable conditions, we define an underlying continuous-time semi-Markov decision process corresponding to the discrete-time process by

Finally, in order to complete our specified model, we need to introduce an optimality criterion. Now, for each , we define the value-at-risk (VaR) of finite horizon total cost at level under a policy by and the AVaR of finite horizon total cost at level under a policy by

The VaR can be interpreted as the maximum possible loss for the generalized asynchronous PBN with probability at least over a time horizon . Hence, the AVaR is the expected cost exceeding the VaR .

In this paper, we are interested in minimizing over and aim to find a policy such that where is the minimum AVaR. Such a policy , when it exists, is called AVaR optimal.

To that end, we have specified the AVaR model for the above generalized asynchronous PBN as follows: where the state space ; the available action set at state ; the semi-Markov kernel with , , and ; and the AVaR have been defined as above.

To solve our optimization problem (8), we need the following alternative representation for AVaR as shown in the literatures [2729].

Let and . Then, for every , it holds that where the minimum point is attained at . The equality above implies that AVaR is equal to VaR plus the expected losses exceeding VaR divided by the probability of these losses occurring, .

To solve (8), we provide a way of minimizing AVaR. In fact, by (10), we have

Based on the theory of [27, 28], to solve the outer optimality problem (i.e., our original optimization problem (8)), we shall first study the inner optimization problem. To this end, we define the expected positive deviation of finite horizon cost from a level under a policy by

Here, can be interpreted as the acceptable cost/loss that the controller expects to spend/lose, and measures the expected amount by which total cost/loss will exceed an expectation over the horizon . Note that in (12) depends on the cost level . For technical convenience, it is natural to introduce a class of more general policies including the cost level . We rewrite the definition of the more general policies denoted by in lieu of . Our goal now is, for every fixed , to minimize over and to seek an optimal policy (depending on ) satisfying where is the value function for the expected positive deviation criterion.

The main goal of this paper is to solve the stochastic optimal control problem above with the help of the theory of semi-Markov decision processes [28, 30]. As described above, using a representation of AVaR in (10), we have converted the problem of minimizing the AVaR of the finite horizon cost for a semi-Markov decision process into a bilevel optimization problem in which the outer optimization problem is an ordinary problem of minimizing a function of a single variable and the inner optimization problem is to minimize the expected positive deviation of the finite horizon cost for semi-Markov decision processes.

4. Solutions to the Optimization Problem

Based on the AVaR model in finite horizon semi-Markov decision processes of the generalized asynchronous PBN described in Section 3, in this section we want to solve the optimality problem and propose a solution method for the optimization problem. We give the main ideas of the solution process as follows. In fact, the algorithm comes from the theory of the AVaR problem for finite horizon semi-Markov decision processes. In order to obtain solutions for the original AVaR criterion (8) in the framework of generalized asynchronous PBNs, we have done it in three steps.

In the first step, we formulate our original problem (8) as a bilevel optimization problem. This step is executed via the representation (10).

In the second step, we solve the inner optimization problem for an arising intermediate criterion (12) in finite horizon semi-Markov decision processes. This treatment is key to analyzing the inner optimization problem.

In the third step, we derive an AVaR optimal policy for the original problem (8) from the solution of the inner optimization problem (12).

From (3.8) in [28], for each and , we rewrite arising intermediate criterion (12) as (14) for the inner optimization problem. where denotes the sojourn times between two successive decision epoches.

According to (14), we define and for every and . Clearly, . Hence, we shall calculate so as to compute . Moreover, from Lemmas 3.3 and 3.4 in [28], we can obtain various similar properties of and .

To analyze the optimality problem, we introduce notations and as follows: for , ,

Next, we are ready to establish the computation and the existence of optimal policies for problem (8). The theorem below develops a value iteration algorithm for calculating the value function and also derives the optimality equation. In addition, with the help of the optimality equation, we show the existence of optimal policies for the problem (13).

Theorem 1. Under suitable conditions, the following assertions are true. (a)For each , let Then, increases in , and .(b) is the unique solution in some function space to the optimality equation .(c)There exists a such that , and such a policy is optimal for the optimality problem.

Proof. The proof of this theorem is similar to that of Theorem 3.1 and Theorem 3.2 in [28]. For a detailed discussion, consult [28] and the references therein.

Remark 1. We have provided a value iteration algorithm for computing the value function (see Theorem 1 (a)), proven that the value function was a unique solution in some function space to the optimality equation (see Theorem 1 (b)), and shown the existence of optimal policies (see Theorem 1 (c)). This result plays a key role in the derivation of the value iteration algorithm and the existence of optimal policies.

We now return to the original problem (8). Let and consider the problem

The following theorem relates optimal policies of the inner optimization problem to AVaR optimal policies of the original problem.

Theorem 2 (see Theorem 3.4 in [28]). Under suitable conditions, there exists a solution (depending on ) of problem (18), and the policy is AVaR optimal for the original problem, where is an optimal policy for the optimality problem (13).

Remark 2. Based on the solution of the inner optimization problem, the existence and computation of an AVaR optimal policy in a generalized asynchronous PBN are established in Theorem 2.

Next, we will propose a value iteration algorithm for computing the optimal value functions based on Theorems 1 and 2 above. Moreover, optimal policies for the optimization problem are constructed in the course of the algorithm. According to [28], the details and the derivation are shown as follows.

Step I. Given an , and set . Let for each .

Step II. For each , compute by

Step III. If , go to Step IV. Otherwise, increment by 1 and return to Step II.

Step IV. For each , find the minimum point of the function , and stop.

5. Numerical Implementation

In this section, a numerical example on the generalized asynchronous PBN is conducted to illustrate our results. From the results presented in [28], the outer optimization problem can be settled by solving the inner optimization problem. We first solve the inner optimization problem using a dynamic programming-based algorithm in Section 4. Then, based on the solution of the inner optimization problem, the existence and computation of an AVaR optimal policy are established by solving the outer optimization problem.

Consider a generalized asynchronous PBN with three genes, , , and . There are two Boolean functions and associated with each gene . These boolean functions are provided by the truth table in Table 1. From this table, we have eight possible Boolean networks.

Suppose that is a control input, taking the value 0 or 1. For the new generalized asynchronous PBN, to be consistent with the notation introduced in Section 2, the variables , , and are renamed as follows: gene now becomes while genes and become and , respectively. The information of Boolean functions can be found in Table 2, which shows that there are four possible Boolean networks.

From Table 2, we know that in this example. Therefore, the state space can be assumed as , and the set of the admissible actions is given by .

The semi-Markov kernel is of the form for every , , , and . Suppose that the corresponding distribution functions of the sojourn time are given by

As described in Section 2, we can derive transition probabilities based on the truth Table 2 and the involved probabilistic parameters.

Moreover, assume that the cost rates are given by

Note that the distribution functions and the cost rates are completely arbitrary. In a real-world example, this information would be obtained from biologists.

For this generalized asynchronous PBN, we want to minimize the AVaR over a time horizon with respect to a given confidence level .

In addition, we set , , and and discretize the time interval [0,15] and the cost level interval [0,150]. Then, we implement the algorithm stated in Section 4 and obtain some datas on the functions (see Figures 14). To be specific, we analyze the datas on , , , and as examples, which are shown in Figures 14 below.

Comparing the data on under admissible action for every , one may obtain an optimal policy for the inner optimization problem. For example, in light of figures, we can define by

According to the definition of policy, is an optimal policy for the inner optimization problem.

Moreover, we find the minimum point of the function with . Figure 5 above gives the graphs of with . From Figure 5, it is easy to see the minimum points with and , that is,

For other , the minimum points can be similarly calculated. Then, the policy is AVaR optimal. For example,

Moreover, we have also computed AVaR and VaR at level 0.95 as shown in Tables 3 and 4.

When the AVaR optimal policy is applied, we will have minimum AVaR at level 0.95, which represents the minimal expected value of the worst 5% costs of the generalized asynchronous PBN over . For example, if the network starts from state 1, the minimum mean value of the worst 5% costs over is 420.9. On the other hand, Tables 3 and 4 indicate that the computed VaR is less than the computed AVaR in all scenarios we considered. When is applied, for example, starting from state 1, the VaR of costs of the generalized asynchronous PBN over [0,15] at level 0.95 is 117.4, which represents the maximum costs of the generalized asynchronous PBN over horizon [0,15] with a confidence level 95%. Note that, in minimizing AVaR for this example, we have simultaneously improved VaR. However, it is not clear if the obtained VaR is the minimum VaR.

6. Conclusion and Discussion

In this paper, we have proposed a systematic approach to design optimal intervention policies by considering some generalized asynchronous PBNs. Such an approach can characterize the time delay among gene interactions. Our analytic results can be analyzed automatically or by medical professionals in order to develop better strategies to improve the quality of medical activities.

It is conceivable that results from this paper may be integrated into identifying potential drugs and treatment methods to achieve the best curative effect on the treatment cancer in the future. Although all the facts indicate that the mechanism of the related applications which is not clear now remain to be developed, the proposed method is only a useful attempt but is useful to explore a new and deeper understanding of biological systems.

Improvement on the proposed method includes further investigation in the following three main areas: computational complexity, classification of states, and the distribution functions of the sojourn time. When the number of states and time points become large, the computational complexity of the algorithm increases. When the number of nodes and the time horizon are fixed, the computational complexity of the algorithm is . The more discussions about the computational complexity of the algorithm can consult [28] and the references therein. As the lack of data has prohibited the inference of any realistic alternative asynchronous models, we have assumed that both the classification of states, the distribution functions of the sojourn time, and the cost function are known. Hence, improving computation efficiency of the algorithm for obtaining the optimal policy for the formulated control model and theoretically getting a reasonable classification of states and the distribution functions of the sojourn time are left as one of our future works.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Natural Science Foundation of Guangdong Province (Grant no. 2014A030313438), Zhujiang New Star (Grant no. 201506010056), Guangdong Province Outstanding Young Teacher Training Plan (Grant no. YQ2015050), and National Natural Science Foundation of China (Grant no. 11771102).