Research on Dynamic Programming Strategy of Bayesian Network Structure Learning

Di, Ruohai; Li, Ye; Li, Tingpeng; Wang, Peng; He, Chuchao

doi:https://doi.org/10.1155/2022/4391071

Scientific Programming

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 4391071 | https://doi.org/10.1155/2022/4391071

Research on Dynamic Programming Strategy of Bayesian Network Structure Learning

Ruohai Di,¹Ye Li,¹Tingpeng Li,²Peng Wang,¹and Chuchao He¹

Academic Editor: Shujuan Jiang

Received18 Jul 2021

Revised14 Mar 2022

Accepted23 Apr 2022

Published20 May 2022

Abstract

Bayesian network structure learning based on dynamic programming strategy can be used to find the optimal graph structure compared with approximate search methods. The traditional dynamic programming method for Bayesian network structure learning is a depth-first-based strategy, which is inefficient. We proposed two methods to solve this problem. First, the dependency constraints were used to prune the process of calculating redundancy scores. The constraints were obtained by the conditional independence test from the observed data sets. However, it was difficult to guarantee the accuracy of the constraints, which may have led to a decrease in the accuracy of the method. Second, we proposed a breadth-first-based strategy, which enhanced efficiency greatly while also ensuring global optimality. Experimental results showed that on the standard network data sets, compared with the dynamic programming based on depth-first search (DFSDP) algorithm, dynamic programming based on constraints (CBDP) could reduce the average running time by 57.10% and that dynamic programming based on breadth-first search (BFSDP) could reduce the average running time by 50.02%. On the UCI data sets, compared with DFSDP, CBDP reduced the average running time by 40.71%, and BFSDP reduced the average running time by 81.78%.

1. Introduction

As a graphical modeling tool, Bayesian networks (BNs) [1] provide a method for expressing the causal relationships between variables, which can be used to obtain knowledge concealed in the data. A Bayesian network is a directed acyclic graph (DAG), in which the nodes correspond to the variables in the domain and the edges correspond to direct probabilistic dependencies. A key feature of Bayesian network research is structure learning, which aims to construct network structures automatically using the observed data sets and prior knowledge. Formally, the structure of the network represents a set of conditional independence assertions: each variable is conditionally independent of its non-descendants given its parents [2].

Learning a BN from observational data is an important problem that has been studied extensively over the past decade [3]. It can be used to automatically construct decision support systems and is used for inferring possible causal relations under certain conditions [4]; the edges in the BN graph have causal semantics. BN has been used widely in reliability analysis [5, 6], medical diagnosis [7, 8], gene analysis [9], fault diagnosis [10], language recognition [11], and index sensitivity analysis problems [12].

Structure learning methods of BN can be categorized into two main types according to the learning accuracy: (1) approximation methods and (2) exact methods. Approximation methods are easy to sink into the local optimal, while exact methods can find the optimal graph structure in the whole solution space. However, the latter method may be limited by the network size and is suitable for occasions with high accuracy requirements. In this study, we focused mainly on the exact methods. The main contributions of this paper follow.(1)We proposed a dynamic programming algorithm based on dependency constraints. We used a priori constraints to guide the DFSDP algorithm to calculate the node family score and find the optimal parent node set, reduce the number of the calculations of the family score to improve the operating efficiency of the algorithm, and effectively reduce the time and space costs.(2)We proposed a dynamic programming strategy based on breadth-first search (BFSDP). This method avoided the backtracking operation that needs to be performed in the iterative process of the DFSDP algorithm and thus improved the efficiency of the algorithm.

This paper is organized as follows. In Section 2, we discuss the existing literature on structure learning according to two types of search methods. The basic knowledge of the BN structure learning and dynamic programming strategy is introduced in Section 3. The structure learning method based on the depth-first strategy is introduced in Section 4.1, and the two proposed algorithms are introduced in Section 4.2. The performance of the proposed methods is shown in Section 5. Finally, we conclude and outline our future work in Section 6.

In this section, we review the algorithms for learning BN structures according to the different kinds of methods that have been proposed to date.(1)Conditional independence (CI) test methods—such as the SGS method [13]; the PC algorithm [14]; and the drafting, thickening, and thinning three-step method [15]—are representative methods. It is difficult for this kind of method, which treats the BN as a graph structure that encodes the independent relationship between variables, to decide whether two variables are independent or conditionally independent, and the time to execute CI tests grows exponentially with the number of the variables.(2)Scoring and searching methods include the following two steps: model selection and model optimization. Model selection requires choosing a criterion, which is named as the scoring function. Currently, the following are some of the frequently used scoring functions: Bayesian information criterion (BIC) score [16], minimum description length (MDL) score [17], Bayesian Dirichlet (BD) score [18], implicit score (IS) [18], and mutual information test (MIT) score [19]. Modeling optimization aims to find the network structure that obtains the highest score according to the selection criterion. Usually heuristic searching algorithms are used, such as the bee colony algorithm [20], the genetic algorithm [21], the fish swarm algorithm [22], and the particle swarm optimization [23, 24]. The methods based on scoring and searching are intended to balance the accuracy, robustness, and sparsity. Nevertheless, these methods are easy to sink into the local optimal, which is considered to be their congenital weakness.(3)Mixed search methods use CI tests to reduce the graph search space and then obtain the optimal network structure through scoring and searching. Tsamardinos et al. [25] proposed the max-min hill-climbing (MMHC) algorithm, which is a typical mixture method. It sets up parent and children sets for each node to build the network skeleton using the MMPC algorithm [26] and then finds the optimal structure using hill-climbing algorithm based on the frame.(4)Among the few articles on optimal search, Ott and Miyano [27] proposed the first exact algorithm. While investigating the problem of exact model averaging, Koivisto and Sood [28] proposed another algorithm that also learned optimal graphs in a similar way. Singh and Moore [29] proposed a recursive implementation method that is less efficient in terms of calculation but has the advantage that potential branch-pruning rules can be applied. Silander and Myllymaki [30] provided a practically efficient implementation of the search and empirically demonstrated that optimal graphs could be learned up to n = 29. Yuan et al. [31] proposed an search algorithm for learning optimal BN. Malone et al. [32] proposed a memory-efficient implementation of the dynamic programming algorithm, which leveraged the layered structure of the dynamic programming graphs representing the recursive decomposition of the problem to reduce the memory requirements of the algorithm from to , where is the binomial coefficient. This kind of method can obtain the optimal network structure but has the following two limitations: (1) the scale of the learned network is limited to 30; (2) the efficiency of the algorithm is insufficient as most of the algorithms are based on depth-first search strategy.

In this study, we focused on how to enhance the efficiency of the exact structure learning algorithm.

3. Theoretical Basis of Bayesian Network

In this section, we briefly introduce the basics of BN and the concepts that are used to learn the structure of these networks.

3.1. Bayesian Network

Definition 1. (Bayesian network). A Bayesian network B(G, θ) is composed of a network structure G(V, E) and a set of network parameters θ = {θ₁, θ₂, …, θ_n}, in which the parameters θ are the set of conditional probabilities of joint distribution of network nodes based on structure G decomposition.

Definition 2. (structure of Bayesian network). The structure of BN is a DAG, whose nodes are variables X₁, X₂,…,X_n ∈ V, and for any X_i ∈ V, after given the set of all its parent nodes (or ), X_i is independent of all of its non-descendant nodes.
From this definition, we can see that the structure of BN is a DAG satisfying Markov condition, from which we can directly obtain a series of local independence relations in the node set V. According to the topological order and Markov condition, the joint distribution of Bayesian network nodes can be decomposed into the following:where denotes the conditional probability that quantifies the parent-child relationship in BN structure. Set to be the number of states of node X_i; then denotes the number of state combinations of the parent node set ; and forms a matrix with size , which is called the conditional probability table (CPT) and is recorded as θ_i. The set of CPT of all nodes θ = {θ₁, θ₂,…,θ_n} is called the parameters of BN. For brevity, we use the symbol θ_ijk to represent conditional probability , where k ∈ {1,2, …,r_i} is the state of X_i, and j ∈ {1,2, …,q_i} denotes the combined state of the parent node set .

Definition 3. (topological order). Let G(V, E) be a BN structure and be a complete permutation of node set V. If holds for any edge (parent-child relationship) , we call a topological ordering (TO) of the Bayesian network.

Definition 4. (valid path). Let G(V, E) be a BN structure and be a path in G. Given the condition set Z ∈ V, the path is valid when the following conditions are satisfied:(i)For any V-structure on the path , a child of X_t or X_t belongs to Z.(ii)Other nodes on the path (including X_i and X_j) do not belong to Z.

Definition 5. (d-separation). Let X, Y, and Z be three disjoint subsets of the node set V of Bayesian network B(G, θ). If there is no effective path between any x ∈ X and y ∈ Y given Z, then X and Y are said to be d-partitioned, denoted as Dsep_G(X, Y|Z).

Theorem 1. (global Markov independence). Let X, Y, and Z be three disjoint subsets of node set V of Bayesian network B(G, θ). If X and are d-separated by Z, then given Z, X and Y are conditionally independent (i.e., X, Y|Z).

Corollary 1. In a BN, given the Markov boundary of any node X, that is, the set of parent node, child node, and spouse node (other parent nodes of child node), X is independent of all of the other nodes in the network.

From Theorem 1, it can be seen that the independence relation contained in the BN structure is a subset of the independence relation contained in the joint distribution which can be decomposed according to the BN structure.

3.2. Scoring Function

The scoring function is used to measure the fitting degree of the BN structure and data. The following is an introduction to commonly used scoring functions.

The CH score is the first Bayesian score function. For BN B and data sets D, the CH score is as follows:where N_ijk denotes the frequency of the family state combination corresponding to the network parameter θ_ijk in the data set D, and ; P(B) is the prior distribution of B; and the rest of (2) is the likelihood function.

The BD score, which introduces a reliable theoretical basis into the CH score, is a more commonly used:where is a gamma function over a real field and satisfies the property , and is the prior knowledge of the family state combination corresponding to the network parameter θ_ijk. The larger the value, the more likely the family state combination, and .

When all , BD score degenerates to CH score. If the likelihood equivalence constraint is given, the likelihood equivalent BD score (BDe) is obtained. If we assume that the probability of all family state combinations is the same (i.e., ), we get the uniform distribution likelihood equivalent BD score (BDeu), but its structure learning result is very sensitive to the equivalent sample size .

The BIC score is another common scoring function, which can be regarded as the maximum likelihood function with penalty term; that is,where N denotes the sample size. The first term of the BIC score is the kernel of the maximum likelihood function, and the second term is the model complexity penalty.

3.3. Exact Search Based on Dynamic Programming

Exact search methods find the optimal solution of the problem in the global space. Thus, they can obtain the optimal network structure, which belongs to the equivalent class of the true network model. As this study mainly researched exact search methods based on the dynamic programming, we have introduced the basic theory of the dynamic programming in detail.

Dynamic programming methods traverse all the node orderings and obtain the global optimal solution. When learning BN structure by dynamic programming, each Bayesian network structure has at least one leaf node. In addition, the score criteria used are decomposable. Supposing the set of variables contained in the problem domain is V and the optimal BN structure has a leaf node X, the state transition equation of dynamic programming is as follows:

Formula (5) together with (6) linked a structure to its substructures. The optimal structure based on the remaining node set was recursively constructed through the previous process, until only one node was left. All of the structures in V construct a hash diagram, which showed the whole process of the dynamic programming. Because the hash diagram contained node ordering information of the network, the diagram was named as an order graph in the literature [28]. Another similar graph was named the parent graph, which contained a candidate parent set for each node. Figure 1 shows the presentation of the order graph and the parent graph.

(a)

(b)

Figure 1(a) is the order graph of four nodes. The order graph starts from the full set V, and each layer in the graph represents a state of dynamic programming. The transition from one state to another is a programming element, and one variable is excluded in each programming element until all of the variables are eliminated. Each node ordering of the BN network corresponds to the opposite direction of the path in the graph. Figure 1(b) is the parent graph of node , and each node in the graph is a candidate parent set of node X₁, which stores the corresponding optimal parent set and the family score, that is, and .

4. Dynamic Programming Algorithm Based on Depth-First Search Strategy

4.1. Search Strategy

The DFSDP strategy started from the bottom of the order graph and searched the paths that connect a full node set and the empty node set from bottom to top. The transition from state S to other states corresponded to the selection of the leaf nodes for node set S in the network. Thus, a path from {X₁, X₂,…,X_n} to {} corresponded to the removal sequence of the leaf nodes, which was a reverse network node ordering. If the value of score (S) could be obtained in a certain path, it was stored. In this way, duplicate computation was avoided as the subsequent paths also may have arrived at the node set S.

The core idea of the depth-first search is as follow: Consider that all of the vertices in the graph were not accessed in the initial state. We started from a certain vertex and accessed the node first. The graph was searched from each of its unaccounted adjacent points in turn, until all of the vertices in the graph that had paths to were accessed. If other vertices were not visited, then we selected another unvisited vertex as the starting point and repeated this process, until all of the vertices in the graph were accessed. Based on the depth-first search strategy, the dotted line in Figure 2 shows an example of one of the search strategies. The red lines in Figure 2 represent the algorithm’s forward search, and the green lines represent the backtracking operation. When the access order of the node is , the search strategy is as follows.

Figure 3, from left to right, shows the search strategy for the order graph of four nodes when taking X₁ as the leaf node under the depth-first search principle. The red lines in Figure 3 represent the algorithm’s forward search, and the green lines represent the backtracking operation. For other nodes as leaf nodes or other parent graphs, the search order is similar. The order of access to nodes by this search strategy is as follows.

(a)

(b)

4.2. Problems and Solutions

The DFSDP algorithm can find the optimal network structure globally. Because the algorithm adopts the depth-first strategy, the backtracking operation was executed repeatedly during the process of iteration, which made the efficiency low. There are two ways to solve this problem: (1) restricting the size of the candidate parent set of each node by constraints; (2) using the breadth-first search strategy instead of the depth-first search strategy. These two improvements are described in the following sections.

4.2.1. Research on Dynamic Programming Algorithms Based on Dependency Constraints

(1) Dependency constraints. Computing node family scores and finding the optimal parent set under the guidance of prior constraints can reduce temporal and spatial cost effectively. We named the method CBDP (dynamic programming based on constraints).

Theorem 2. Given sample data set D on a set of variables X = {X₁, X₂,…,X_n}, if conditional independence holds, the statistic approximately obeys the distribution of degree of freedom .

In Theorem 2, , denotes the number of samples in D, that is, , and denotes the number of values of variable X_i.

Definition 6. (Dependency coefficient). Given sample data set D on a set of variables X = {X₁, X₂,…,X_n}, is defined as the dependency coefficient between variables X_i and X_j. denotes the statistic of variable X_i and X_j under the condition of a given variable X_k. denotes the value of distribution with significance level and degree of freedom (r_i − 1) (r_j − 1)r_k.
According to the hypothesis test, if , then X_i and X_j are interdependent whether or not additional nodes are added. If , then X_i and X_j may be independent given some variable X_k ∈ X.

Lemma 1. Given sample data set D on a set of variables X = {X₁, X₂,…,X_n}, variables X_i and X_j are locally conditionally independent at the significance level if and only if there exists a variable X_k ∈ X where holds.

Lemma 1 can be obtained directly by Definition 6 and by the hypothesis test. According to Lemma 1, variables X_i and X_j are locally conditionally independent at the significance level if and only if . Variables X_i and X_j are globally conditionally independent at the significance level if and only if , holds.

Matrix C = [c_ij] is defined as the dependent coefficient matrix of variable set X, where

We took the dependent coefficient matrix C as the prior constraints and restricted the size of the candidate parent set of each node in the network. Then, the number of family score calculations could be reduced effectively, and the efficiency of the algorithm was improved as a result.

(2) Search strategy.

Theorem 3. Assume that C is the dependent coefficient matrix of the network. CPa(X) refers to the set where variables are interdependent with X in C. CNPa(X) infers the set where variables are independent with X in C. U refers to the candidate parent set of X. PSbest (X) refers to the optimal parents set of X. Then, the following relationships hold:(1), if c_ij≠0, then X_i ∈ CPa(X_j); if c_ij = 0, then X_i ∈ CNPa(X_j).(2), , .

Theorem 3 can be obtained by the definition of the dependency coefficient. According to Theorem 3, the formula for calculating family scores can be rewritten fromto

This transformation can reduce the total score calculation times from n2^n − 1 to .

Supposing that c_2,1 > 0, c_3,1 > 0, and c_4,1 = 0 hold in the dependent coefficient matrix—that is, X₂ ∈ CPa(X₁), X₃ ∈ CPa(X₁), and X₄ ∈ CPa(X₁), which are shown in Figure 4(a)—the search strategy of parent graph of X₁ is shown in Figure 4(b).

In general, the total number of family scores needed to be calculated in standard DP is n2^n − 1 and 2^n − 1 for each node. When considering the dependency constraints, the total number of family scores needed to be calculated in DP was and for each node. |CPa(X)| was far less than 2^n − 1 in general. Thus, the efficiency of the algorithm after adding dependency was enhanced greatly, as calculating the score was one of the most time-consuming parts. We next compared the candidate parent set (CPS) and number of family scores (NFS) of the Asia network with those needed to be considered with and without the dependency constraints in DP.

From Table 1, we also can find that the candidate parent set obtained by the dependency constraints was not exactly the same as that of the standard BN, which meant that the dependency constraints mined from the sample data were not always accurate.

(a)

(b)

4.2.2. Research on Dynamic Programming Algorithms Based on Breadth-First Search Strategy

(1) Search strategy. As mentioned earlier, the CBDP method restricts the size of candidate parent sets of each node by constraints, which then reduces the time for calculating the family score and improves the efficiency of the algorithm. However, the dependency constraint is mined from sample data, whose accuracy cannot be guaranteed, which may reduce the accuracy of the algorithm to a certain extent. As a result, the optimal network structure cannot be obtained. To address this problem, we proposed a breadth-first search strategy to replace the depth-first search strategy to avoid the backtracking operation. In this way, the search efficiency could be improved while guaranteeing the global optimum.

The core idea of the breadth-first search is as follows: We started from a vertex in the graph and visited the adjacent points of once after visiting . Then, we accessed the adjacent points of these points in turn and followed the principle that “the adjacent points of the first visited vertex take precedence over those of the second visited vertex,” until all of the adjacent vertices of the visited vertices in the graph were accessed. If other vertices were not visited, we selected another unvisited vertex as the starting point and repeated this process, until all of the vertices in the graph were accessed. Figure 5 shows an example search strategy when the access order of the node is .

Figure 6, from left to right, shows the search strategy for the order graph of four nodes when taking X₁ as the leaf node under the breadth-first search principle. The red lines in Figure 6 represent the algorithm’s forward search. For other nodes as leaf nodes or other parent graphs, the search order is similar. The order of access to the nodes by this search strategy is as follows.

(a)

(b)

(2) Search method. From the analysis in the preceding section, we found that, compared with the depth-first search strategy, the breadth-first search strategy was more efficient because we did not have to execute the backtracking operation. Therefore, we proposed the BFSDP algorithm. The execution steps of the BFSDP algorithm are shown in Algorithm 1.

(1)	Obtain family scores of all nodes and store them.
(2)	Obtain the best parent set and the corresponding score of each and store them.
(3)	Obtain the optimal network structure score and the optimal leaf nodes of each node combination and store them.
(4)	Construct the network structure.

The algorithm may repeatedly query the family score and the optimal network structure score of each combination during the execution process. Therefore, we constructed a hash table corresponding to the node set and its label to improve the query efficiency. Representing node sets in binary encoding, for network with n nodes, a binary array b with n digits was set. For a set U in the node sequence diagram, if X_i ∈ U, then position i of b was set to 1; otherwise, it was set to 0. We then converted it to decimal labels by hash function . In scoring lookup, we used the decimal label instead of the node set. Taking a network of four nodes as an example, the hash table is shown in Table 2. Step 1: Calculate the whole family score of each node in the network and store them in the hash table, as follows: Step 2: For each node in the network, obtain the best parent set and the corresponding score and store them in the hash table according to the breadth-first search strategy in their parent graph, as follows: Step 3: Obtain the optimal network structure score and the optimal leaf node of each node combination and store them in the hash table according to the breadth-first search strategy in order graph, as follows: Step 4: Starting from the full node combination, extract the optimal leaf node and the optimal parent node set of the corresponding leaf node to construct part of the network structure. Update the current node set and repeat the process until the node set is empty. The pseudocode of Step 4 in the BFSDP algorithm is given in Algorithm 2.

	Input: The best parent sets of all nodes (), number of nodes (n), and best leaf node of each combination of nodes ().
	Output: The optimal network structure (G).
	Set , , , ;
	For m = n to 1 in steps of −1 do
	X = index of nodes in Hash Table, calculated by ;
	Y = Leaf (X);
	The m-th position in order is set to be Y;
	The Y-th position in nodes is set to be 0;
	Set raw to be the best parent sets of Y, which is ;
	G (raw, Y) 1;
	End for
	Return G

5. Experiments

5.1. Experimental Setup

When sample data are generated by sampling from benchmark networks, each network generated 10 sets of sample data with fixed sample size and executed the algorithm once based on each set of sample data. Therefore, the experimental results listed in this section were the average of 10 experiments. All of the experimental environments in this section were Windows 10, Inter® Core^TM i5-6500 CPU @3.20 GHz, RAM 4.00 GB, using the MATLAB R2014a software platform. The setting of experimental parameters is given in Table 3.

5.2. Experimental Results

The experimental results of the time-consuming comparison of the three algorithms under standard network data sets are shown in Table 4 and Figure 7. The corresponding experimental results of the accuracy comparison are shown in Table 5. The experimental results of the time-consuming comparison of the three algorithms under UCI standard data sets are shown in Table 6. The corresponding experimental results of the accuracy comparison are shown in Table 7. “OT” in Tables 4, 6, and 7 indicates that the execution time of the algorithm exceeded the upper limit. We set the time limit in this study to 3 days. “OM” indicates that the storage space required by the algorithm exceeded the memory of the computer. “Nodes” denotes the number of nodes in the network, “Edges” denotes the number of edges of the network, and “Tns” denotes the total number of states for all of the nodes. The confidence level of the CBDP algorithm was set to be 0.05 when computing dependency constraints [33]. The DFSDP algorithm and the CBDP algorithm both exceeded the time limit on all of the sample data sets when learning the Child network. All of three algorithms exceeded the memory limit on all of the sample data sets when learning the Insurance network. Thus, we did not compare the learning accuracy of these two networks in Table 5.

(a)

(b)

(c)

(d)

(e)

(f)

Based on the execution time overhead of DFSDP algorithm, the ratio of execution time of the CBDP algorithm and the BFSDP algorithm overhead to baseline is shown in Figure 7.

According to the time-consuming comparison results, because the DFSDP algorithm adopted the depth-first search strategy, which required a repetitive backtracking operation, the execution efficiency of this algorithm was the slowest among all of the data sets. Compared with the DFSDP algorithm, the CBDP algorithm constructed a parent graph according to dependency constraints, but the algorithm was still executed under the depth-first strategy, and the efficiency was enhanced significantly. The CBDP algorithm could reduce the running time by −3.70% to 84.74% (57.10% in average) compared with the DFSDP algorithm. The BFSDP algorithm adopted a breadth-first search strategy, and no backtracking operation was required. Its efficiency was higher than that of the DFSDP algorithm on all of the data sets. The BFSDP can reduce the running time by 9.97% to 98.57% (50.02% in average) compared with DFSDP. The average efficiency improvement percentage of the CBDP algorithm was higher than that of the BFSDP algorithm because in small-scale network learning, the efficiency improvement of the CBDP algorithm was more significant. When the scale of the network was small (nodes less than 6), the efficiency of the CBDP algorithm was higher than that of the BFSDP algorithm. If the scale of the network becomes larger, the BFSDP algorithm will be more efficient.

When the scale of the network was small, the difference in efficiency between the depth-first search strategy and breadth-first search strategy was not obvious, and the effect of constraints on the efficiency of the algorithm was more significant. As the scale of the network grew, improvements in the efficiency of the algorithm made by the search strategy were more significant than those made by the constraints. When the scale of the network structure was further enlarged (larger than 20), only the BFSDP algorithm could find the optimal network within the time limit, whereas the DFSDP algorithm and CBDP algorithm both exceeded the time limit defined in this study. When the number of nodes in the network was greater than 26, the storage space required by all three algorithms exceeded the computer memory.

According to the accuracy comparison results, the DFSDP algorithm and the BFSDP algorithm always found the optimal network structure on all of the data sets and under different sample sizes. When the scale of the network was small (nodes less than 6) and the sample size was sufficient (size more than 1000), the CBDP algorithm also found the optimal network structure. When the scale of the network grew larger, the CBDP algorithm was less accurate than the other two algorithms. The score of the structure obtained by CBDP algorithm reduced from 0.91% to 46.77% (17.14% in average). As the CBDP algorithm pruned the search process by dependency constraints, even though the accuracy of the dependency constraints may not have been guaranteed, the larger the scale of the network, the higher the accuracy of constraints required (which was also true when a larger sample size was required). Thus, when learning large-scale networks, the accuracy of the CBDP algorithm decreased.

The conclusions of the analysis based on the time-consuming comparison and the accuracy comparison results under the UCI data sets were similar to those obtained under standard network data sets. In terms of efficiency, the CBDP algorithm reduced the running time by -2.02% to 81.11% (40.71% in average) compared with the DFSDP algorithm. The BFSDP algorithm reduced the running time by 54.26% to 99.93% (81.78% in average) compared with the DFSDP algorithm. In terms of accuracy, the score of the structure obtained by the CBDP algorithm was reduced by 7.46% to 101.74% (60.58% in average). Note that when the sample size was small relative to the scale of the network structure, the accuracy of the CBDP algorithm was usually poor. The main reason was that the sample size was insufficient to support the accuracy of the dependency constraints on the scale of the network under this circumstance. In addition, although some data sets had fewer than 27 nodes, as the number of value states of each node was large, this led to an increase in the number of family scores needed to be stored, and the memory still exceeded the limit.

6. Conclusions and Future Work

To address the problem that the traditional dynamic programming methods based on depth-first search strategy are inefficient, we proposed to prune the process of calculating redundancy scores by dependency constraints. Because it was difficult to guarantee the accuracy of the constraints, this led to a decrease in the accuracy of the method. Then, we proposed a breadth-first based strategy, which enhanced the efficiency significantly while also ensuring the global optimality. The experiments comparing the three algorithms verified the validity of the proposed CBDP algorithm and the BFSDP algorithm.

In this study, priors were integrated into the construction of parent graph, and the parent graph was pruned with priors, which enhanced the search efficiency. The reduction in the space complexity of the algorithm, however, was insufficient. Even with the addition of priors, it was still impossible to learn the large-scale network structures. Although the space complexity of the algorithm could be effectively reduced by pruning the order graph with priors, it would be ideal to learn the network structure with more nodes. Future work will focus on extending the learning scale based on prior constraints.

Data Availability

The data used in the experiments are 14 datasets in the UCI database and sampled from the 8 standard networks.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to Prof. Wang and Dr. Guo for helpful discussions. This work was supported by the National Key Laboratory Fund (CEMEE2020Z0202B) and the Shaanxi Science Foundation (2020JQ-816 and 20JK0680).

References

J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning, Morgan Kaufmann Press, Burlington, Massachusetts, USA, 1988.
N. Friedman and D. Koller, Being Bayesian about Network Structure,” Uncertainty In Artificial Intelligence, Elsevier Science, Amsterdam, Netherlands, pp. 201–210, 2000.
C. L. Hu, “A review of Bayesian networks,” Journal of Hefei University, vol. 23, no. 1, pp. 33–40, 2013.
View at: Google Scholar
P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, Springer, New York, NY, USA, 1993.
X. Pan, D. Zuo, and W. Zhang, “Research on human error risk evaluation using extended Bayesian Networks with hybrid data,” Reliability Engineering & System Safety, vol. 209, Article ID 107336, 2021.
View at: Publisher Site | Google Scholar
B. Sun, Y. Li, and Z. Wang, “A combined physics of failure and bayesian network reliability analysis method for complex electronic systems,” Process Safety and Environmental Protection, vol. 148, no. 3, pp. 698–710, 2021.
View at: Publisher Site | Google Scholar
K. Yu, L. Liu, and J. Li, “Multi-Source causal feature selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2240–2256, 2020.
View at: Publisher Site | Google Scholar
S. McLachlan, K. Dube, and G. A. Hitman, “Bayesian Networks in Healthcare: Distribution by Medical Condition,” Artificial Intelligence in Medicine, vol. 107, Article ID 101912, 2020.
View at: Publisher Site | Google Scholar
S. Lax, N. Sangwan, D. N. Smith et al., “Bacterial colonization and succession in a newly opened hospital,” Science Translational Medicine, vol. 9, no. 391, pp. 6500–6513, 2017.
View at: Publisher Site | Google Scholar
Z. Wang, Z. Wang, and S. He, “Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information,” Applied Energy, vol. 188, pp. 200–214, 2017.
View at: Publisher Site | Google Scholar
Q. Xiao, M. Qin, and P. Guo, “Multimodal fusion based on LSTM and a couple conditional hidden markov model for Chinese sign language recognition,” IEEE Access, vol. 7, Article ID 112268, 2019.
View at: Publisher Site | Google Scholar
C. C. He and X. G. Guo, “BNSobol method for accuracy sensitivity analysis of helicopter fire control system,” Journal of Aeronautics, vol. 37, no. 10, pp. 3110–3120, 2016.
View at: Google Scholar
P. Spirtes, C. Glymour, and R. Scheines, “Causality from Probability,” Evolving Knowledge In Natural And Artificial Intelligence, 1989.
View at: Google Scholar
P. Spirtes and C. Glymour, “An Algorithm for Fast Recovery of Sparse Causal Graphs,” Social Science Computer Review, vol. 9, pp. 62–72, 1990.
View at: Publisher Site | Google Scholar
J. Cheng, D. A. Bell, and W. Liu, “Learning Belief Networks from Data: An Information Theory Based Approach,” in Proceedings of the International Conference on Information and Knowledge Management, pp. 325–331, Las Vegas, Nevada, USA, November 1997.
View at: Google Scholar
J. Suzuki, “Approximating Discrete Probability Distributions with Causal Dependence Trees,” in Proceedings of the 2010 International Symposium On Information Theory And its Applications, pp. 100–105, IEEE, Taichung, Taiwan, Octomber 2010.
View at: Google Scholar
F. Gregory, G. F. Cooper, and E. Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, vol. 9, no. 4, pp. 309–347, 1992.
View at: Google Scholar
L. Bouchaala, A. Masmoudi, and F. Gargouri, “Improving algorithms for structure learning in Bayesian Networks using a new implicit score,” Expert Systems with Applications, vol. 37, no. 7, pp. 5470–5475, 2010.
View at: Publisher Site | Google Scholar
J. Ji, H. Wei, and C. Liu, “An artificial bee colony algorithm for learning Bayesian networks,” Soft Computing, vol. 17, no. 6, pp. 983–994, 2013.
View at: Publisher Site | Google Scholar
L. M. D. Campos, “A scoring function for learning bayesian networks based on mutual information and conditional independence tests,” Journal of Machine Learning Research, vol. 7, no. 7, pp. 2149–2187, 2006.
View at: Google Scholar
C. Contaldi, F. Vafaee, and P. C. Nelson, “Bayesian network hybrid learning using an elite-guided genetic algorithm,” Artificial Intelligence Review, vol. 293, pp. 1–28, 2018.
View at: Publisher Site | Google Scholar
T. Guo and F. Lin, “Bayesian network structure learning based on hybrid genetic fish swarm algorithm,” Journal of Zhejiang University, vol. 48, no. 1, pp. 130–135, 2014.
View at: Google Scholar
G. Li, L. Xing, and Z. Zhang, “A new bayesian network structure learning algorithm mechanism based on the decomposability of scoring functions,” IEICE - Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E100.A, no. 7, pp. 1541–1551, 2017.
View at: Publisher Site | Google Scholar
S. Gheisari and M. R. Meybodi, “BNC-PSO: structure learning of Bayesian networks by Particle Swarm Optimization,” Information Sciences, vol. 348, pp. 272–289, 2016.
View at: Publisher Site | Google Scholar
I. Tsamardinos, L. E. Brown, and C. F. Aliferis, “The max-min hill-climbing Bayesian network structure learning algorithm,” Machine Learning, vol. 65, no. 1, pp. 31–78, 2006.
View at: Publisher Site | Google Scholar
I. Tsamardinos, S. Aliferis, and A. Statnikov, “Time and sample efficient discovery of Markov blankets and direct causal relations,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 673–678, New York, NY, USA, August 2003.
View at: Publisher Site | Google Scholar
S. Ott and S. Miyano, “Finding optimal gene networks using biological constraints,” Genome informatics. International Conference on Genome Informatics, vol. 14, pp. 124–133, 2003.
View at: Google Scholar
M. Koivisto and K. Sood, “Exact bayesian structure discovery in bayesian networks,” Journal of Machine Learning Research, vol. 5, no. 5, pp. 549–573, 2004.
View at: Google Scholar
A. P. Singh and A. W. Moore, Finding Optimal Bayesian Networks by Dynamic Programming, Carnegie University, Pittsburgh, PA, USA, 2005.
T. Silander and P. Myllymaki, “A Simple Approach for Finding the Globally Optimal Bayesian Network structure,” in Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, pp. 445–452, AUAI Press, Cambridge, MA, July 2006.
View at: Google Scholar
C. Yuan, B. Malone, and X. Wu, “Learning optimal Bayesian networks using search,” in Proceedings of the international joint conference on artificial intelligence, pp. 2186–2191, AAAI Press, Barcelona, Spain, July 2011.
View at: Google Scholar
B. Malone, C. H. Yuan, and E. A. Hansen, “Memory-efficient dynamic programming for learning optimal bayesian networks,” in Proceedings of the Twenty- Fifth AAAI Conference on Artificial Intelligence, pp. 1057–1062, AAAI Press, San Francisco, California, August 2011.
View at: Google Scholar
S. Behjati and H. Beigy, “Improved K2 algorithm for Bayesian network structure learning,” Engineering Applications of Artificial Intelligence, vol. 91, no. 3, Article ID 103617, 2020.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Ruohai Di et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

251

Downloads

369

Citations

Scientific Programming

Research on Dynamic Programming Strategy of Bayesian Network Structure Learning

Abstract

1. Introduction

2. Related Work

3. Theoretical Basis of Bayesian Network

3.1. Bayesian Network

3.2. Scoring Function

3.3. Exact Search Based on Dynamic Programming

4. Dynamic Programming Algorithm Based on Depth-First Search Strategy

4.1. Search Strategy

4.2. Problems and Solutions

4.2.1. Research on Dynamic Programming Algorithms Based on Dependency Constraints

4.2.2. Research on Dynamic Programming Algorithms Based on Breadth-First Search Strategy

5. Experiments

5.1. Experimental Setup

5.2. Experimental Results

6. Conclusions and Future Work

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright