Convergence of a Belief Propagation Algorithm for Biological Networks
Constructing network models of biological systems is important for effective understanding and control of the biological systems. For the construction of biological networks, a stochastic approach for link weights has been recently developed by using experimental data and belief propagation on a factor graph. The link weights were variable nodes of the factor graph and determined from their marginal probability mass functions which were approximated by using an iterative scheme. However, there is no convergence analysis of the iterative scheme. In this paper, at first, we present a detailed explanation of the complicated multistep process step by step with a network of small size and artificial experimental data, and then we show a sufficient condition for the convergence of the iterative scheme. Numerical examples are given to illustrate the whole process and to verify our result.
Systems of mathematical equations have been used for modeling interactions of genes or proteins in biological systems. The modeling consists of two parts: one is to construct a network of nodes and links, where nodes usually represent genes or proteins and links denote interactions of nodes. The second is to build a set of mathematical formulas for governing the network dynamics, which might be expressed as a system of differential equations or Boolean logic functions [1, 2]. Modeling of a biological network has been focused on the construction of a directed network for a given biological system and its mathematical formulas, with which biological phenomena and mechanisms have been explained and controlled  and, furthermore, modeling of a biological network is a useful tool for understanding and explaining the distribution of the biological communities too . However, there are few researches on how to determine an optimal set of nodes and links which are key components of networks for biological phenomena.
There could be multiple networks for explaining a given biological system. So, it would be meaningful to use a probabilistic modeling approach which could yield multiple networks for the biological system. One approach was recently developed in [5, 6] for the construction of network models of a melanoma cell line (SK-MEL-133), the most serious skin cancer, in which each link weight was considered as a discrete random variable and its marginal probability mass function (PMF) was constructed by using a factor graph and a belief propagation (BP) algorithm. The marginal PMFs were used to choose multiple weights of high-probability values and then to construct multiple networks for the biological system. The initial step of the approach was to determine a prior knowledge network (PKN) for melanoma and its mathematical formula structure. Nodes were chosen for representing some drugs (drug nodes) for melanoma or proteins (measured nodes) which take important roles in melanoma. The drug nodes had outgoing links to each measured node without incoming links. Each measured node had an outgoing link to each other measured node and an incoming link from each other measured node. Each link weight in the mathematical formulas was not determined in the initial step. Next, each protein concentration was measured at steady state before and after treatment of the drugs, which produced experimental data for determining the link weights. Then, the joint PMF of link weights was defined as the Boltzmann/Gibbs form  for the factorization of the joint PMF. The form contained a cost function of the experimental data and the simulated values of measured nodes at steady state, where the simulated values were not constants for undetermined link weights. So, the factorization of the joint PMF directly led to the factor graph which is a bipartite graph consisting of variable nodes (link weights in the network) and factor nodes (factors in the factorization). Finally, the marginal PMFs were approximated with an iterative scheme constructed from belief (message) propagation on the factor graph.
In general, computation time could be high to directly compute marginal distributions from a joint probability distribution. So, factor graphs and BP have been applied to infer marginals in diverse problems [8–12]. It is known that a marginal obtained from applying BP becomes the true marginal for an acyclic factor graph and BP could be successfully applied to a cyclic graph [13–18]. There have been many researches on conditions for the convergence of beliefs, which can be applied for computing marginals from a Gaussian joint probability density function (PDF) [19, 20]. However, the convergence conditions cannot be applied to the convergence of beliefs (messages) in [5, 6], since the joint random variable is discrete. And the authors [5, 6] did not analyze the convergence. As far as we know, there is no framework for the inference of marginal PMFs based on both BP and perturbation data except the approach in [5, 6]. For the use of the stochastic approach  in the construction of networks based on BP as well as perturbation data, a sufficient condition for the convergence of messages is necessary. Under such a sufficient condition marginal PMFs can be identified and, as a result, multiple network models for the given biological system can be constructed.
In this paper, with our simple network, we present a detailed explanation of the long and intricate process of constructing a system of equations with messages from variable nodes to factor nodes, and vice versa for the approximation of marginal PMFs. Since approximate marginal PMFs are defined with the solution of the system of equations, we identify recursive relations for the solution of the system of equations and construct iterative schemes for solving the recursive relations. Finally we show a sufficient condition for the convergence of the schemes by using a Banach fixed-point theorem (see, e.g., [21, 22]). Numerical examples are given to illustrate the process in Section 4 and to verify our results.
For a clear explanation of the convergence of the iterative schemes mentioned in the Introduction section, we make a simple network of three nodes and four links (see Figure 1(a)), which is assumed to be a PKN. Since the simple network has key components necessary for determining approximate marginal PMFs, the simple network can be extended to any network, including the biological network in . So, our convergence analysis in the Results section can be applied for any network. The network has two measured nodes and one drug node . We consider two treatments as the and perturbations. Symbols in Figure 1(b) denote , where and are the concentrations of at steady state before and after the perturbation, respectively. Since each drug node has an outgoing link to each measured node and no incoming link to any node, drug node has two outgoing links to and with weights and , respectively. Each measured node has an outgoing link to each other measured node and incoming links from each other nodes, and so, measured node has one outgoing link to with weight and two incoming links from and with weights and , respectively. Similarly, measured node has one outgoing link to with weight and two incoming links from and with weights and , respectively.
As in , the dynamics of the given situation is modeled with the mathematical formulas as in Figure 1(c). Simulated values at steady state under the perturbation are assumed to be the formwhere . From now on, weights are considered as discrete random variables with three values (-1, 0 and 1) and their PMFs are denoted as follows: for which are approximately calculated in the Results section by using a factor graph and BP based on the factor graph. In , discretization over the 3 weight values led to a high rate of convergence and did not capture appropriate uncertainty in a probability distribution and then 11 weight values were used. However, our goals in this paper are to find a convergence condition for messages and to explain the complicated multistep process in  step by step by using a prior knowledge network of small size. So, the three weight values are enough for our goals. For the extension of our results to networks of large size and real perturbation data, some notations and mathematical expressions used in our paper might be rather cumbersome in later sections.
In this section, we present a system of equations for each marginal PMF, iterative schemes for the solutions and a sufficient condition for the convergence of the iterative schemes.
3.1. System of Equations for Marginal PMFs
To define a joint PMF of link weights based on the experimental data we consider the sum of squared errors between simulated values and experimental values as well as the penaltywhich reflects the property that the total number of links tends not to be large. Note that due to no incoming link to drug node . Then the cost function is defined aswhere the error and penalty are weighted by positive constants and , respectively. Using (1), the cost is written asPosteriori probability (joint PMF) is defined aswhere is the constant ensuring that the sum of the probabilities over all model configurations is equal to one. Let , where and are the constants ensuring that the sums of probabilities and over all model configurations are equal to one, respectively. Substituting the cost (6) into gives which implies that and can be independently determined by using a similar approach. So, we consider to show the approximation of marginal PMF :It is not efficient to calculate the exact marginal with the enumeration in the cases where the numbers of nodes in a network or experimental perturbations become large. Therefore, the process to approximate the marginal is presented step by step by using a factor graph and BP on the factor graph.
Step 1 (introduction of a factor graph and BP on the factor graph). Using the factorization in (9), factor nodes are defined asand then marginal in (9) can be written as follows:which yields the factor graph of two variable nodes and two factor nodes as in Figure 2. Following BP on the factor graph, message from variable node to factor node is defined as where is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one and is the message from to defined asUsing message , marginal PMF in (11) can be approximated aswhere is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one.
Step 2 (approximation of the summation (13)). The process of the approximation used in  can be divided into two parts: the first is to change multiple summations into a single summation by introducing a new random variable. The next is to change the single summation into an integral.
Step 2-1. Since in (13) is a function of , all random variables in , which are in (13), can be divided into two type of random variables: one is and the other is , which becomes the linear combination of random variables in the cases where the number of nodes becomes large. Using the definitions of and in (10), we can write in (13) aswhich is a function of random variables and . Then (13) becomesLettingfor some positive integer , message in (16) becomesNote thatwhich implies that the following can be a PMF of So, message in (18) can be written as a single summationStep 2-2. Note that is a sum of random variables , which are assumed to be independent. Even though is not identically distributed, the authors  invoked the central limit theorem to approximate the PMF of as a Gaussian with reference to , where there was no explicit justification for the application of this theorem. Since sums of independent random variables converge in distribution to the standard normal as long as some condition (e.g., the Lindeberg Condition ) is satisfied, we think that such a condition might be implicitly assumed in . So, the approximate PMF of becomes where average and variance of are defined asSymbols and denote the averages of and , respectively:and then the sum over configurations in (19) is approximated with a Gaussian integration:
Step 3 (approximation of the improper integral (26)). For the calculation of the improper integral (26), error is linearized with respect to the maximization of the fitness in (15). Note that the equalitycan be written asunder the assumption thatThen error in (15) is approximated bywhich yields Hence we have where the last equality is obtained both by the identity and the property of a PDF (the integral of a PDF over its domain is equal to 1). Therefore adopting the final approximation of in , approximate marginal PMF in (14) can be obtained by solving the system of equationswhere , , , and are in (23)–(25).
Remark 1. Since the approximation (34) was used in  and our goal in this paper is to find a convergence condition for the algorithm in , we also use (34) instead ofEven when using (36) instead of (34), the convergence analysis in the later subsections could be applied (we do not show it in this paper).
Remark 2. Similarly approximate marginal PMF can be obtained by replacing subscripts 1 and , related with node , in , , , and in (34), (35), and (23)–(25) with 2 and , related with node : for , :where is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one and
3.2. Iterative Schemes for Solving the System of Equations
In this subsection, we construct sequences and by using (34) and (35). In the next subsection we present a sufficient condition for the convergence of the two sequences, which implies that the two limits satisfy (34) and (35). So, the limit of becomes value , leading to the construction of approximate marginal PMF in (14). We assume thatand initial terms of are defined aswhere is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one. The iterations and are defined similarly to (34) and (35) and so we need to define and by using (23) and (24): and So, the iteration is defined aswhere is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one. Similarly the iteration is defined asfor , and . Here is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one andTherefore the desired iterative schemes consist of (42), (43), (47), (48), and (49) under the assumption (29).
Remark 3. Since defined in (48) is a PMF due to (47) and (49), we havewhich gives the positivity of the following term in (49)So, we obtain a lower bound of Therefore we obtain an upper boundwhich is used in the proof of Lemma 8.
Remark 4. Note that sequence in the recursive relation (47) contains no . Then, in the next subsection, we present a sufficient condition for the convergence of sequence without using and, as a result, the limit of is used to define the message .
Remark 5. Similarly, to define messages , we use the following sequences:where is the normalization constant ensuring that the sum of the probabilities over all model configurations is equal to one and
3.3. A Sufficient Condition for the Convergence of the Iterative Schemes
Letandwhere the subscript 1 of and represents node . The iteration in (47) is written asIt follows from (47), (57), and (58) that are functions of two independent variables: for We use Banach fixed-point theorem  for the convergence of the sequence (59) to prove Theorem 9, which is our main result.
Theorem 6. Let be a closed subset of for a positive integer . If a function satisfies that for a constant and all in then there exists a unique fixed point such that , which is the limit of sequence for any .
For the application of Theorem 6 to the proof of our main result we need the following lemmas.
Lemma 7. Assume that experimental data are contained in the codomain of . Assume that positive constants and satisfy inequalities for and defined in (53). Then defined in (49) and (57)–(60) becomes a function from domain to codomain .
Lemma 8. Assume that experimental data are contained in the codomain of . Assume that positive constants and satisfy inequalities for and defined in (53). Then for defined in (49) and (57)–(60) and where