Abstract

Bayesian network structure learning based on dynamic programming strategy can be used to find the optimal graph structure compared with approximate search methods. The traditional dynamic programming method for Bayesian network structure learning is a depth-first-based strategy, which is inefficient. We proposed two methods to solve this problem. First, the dependency constraints were used to prune the process of calculating redundancy scores. The constraints were obtained by the conditional independence test from the observed data sets. However, it was difficult to guarantee the accuracy of the constraints, which may have led to a decrease in the accuracy of the method. Second, we proposed a breadth-first-based strategy, which enhanced efficiency greatly while also ensuring global optimality. Experimental results showed that on the standard network data sets, compared with the dynamic programming based on depth-first search (DFSDP) algorithm, dynamic programming based on constraints (CBDP) could reduce the average running time by 57.10% and that dynamic programming based on breadth-first search (BFSDP) could reduce the average running time by 50.02%. On the UCI data sets, compared with DFSDP, CBDP reduced the average running time by 40.71%, and BFSDP reduced the average running time by 81.78%.

1. Introduction

As a graphical modeling tool, Bayesian networks (BNs) [1] provide a method for expressing the causal relationships between variables, which can be used to obtain knowledge concealed in the data. A Bayesian network is a directed acyclic graph (DAG), in which the nodes correspond to the variables in the domain and the edges correspond to direct probabilistic dependencies. A key feature of Bayesian network research is structure learning, which aims to construct network structures automatically using the observed data sets and prior knowledge. Formally, the structure of the network represents a set of conditional independence assertions: each variable is conditionally independent of its non-descendants given its parents [2].

Learning a BN from observational data is an important problem that has been studied extensively over the past decade [3]. It can be used to automatically construct decision support systems and is used for inferring possible causal relations under certain conditions [4]; the edges in the BN graph have causal semantics. BN has been used widely in reliability analysis [5, 6], medical diagnosis [7, 8], gene analysis [9], fault diagnosis [10], language recognition [11], and index sensitivity analysis problems [12].

Structure learning methods of BN can be categorized into two main types according to the learning accuracy: (1) approximation methods and (2) exact methods. Approximation methods are easy to sink into the local optimal, while exact methods can find the optimal graph structure in the whole solution space. However, the latter method may be limited by the network size and is suitable for occasions with high accuracy requirements. In this study, we focused mainly on the exact methods. The main contributions of this paper follow.(1)We proposed a dynamic programming algorithm based on dependency constraints. We used a priori constraints to guide the DFSDP algorithm to calculate the node family score and find the optimal parent node set, reduce the number of the calculations of the family score to improve the operating efficiency of the algorithm, and effectively reduce the time and space costs.(2)We proposed a dynamic programming strategy based on breadth-first search (BFSDP). This method avoided the backtracking operation that needs to be performed in the iterative process of the DFSDP algorithm and thus improved the efficiency of the algorithm.

This paper is organized as follows. In Section 2, we discuss the existing literature on structure learning according to two types of search methods. The basic knowledge of the BN structure learning and dynamic programming strategy is introduced in Section 3. The structure learning method based on the depth-first strategy is introduced in Section 4.1, and the two proposed algorithms are introduced in Section 4.2. The performance of the proposed methods is shown in Section 5. Finally, we conclude and outline our future work in Section 6.

In this section, we review the algorithms for learning BN structures according to the different kinds of methods that have been proposed to date.(1)Conditional independence (CI) test methods—such as the SGS method [13]; the PC algorithm [14]; and the drafting, thickening, and thinning three-step method [15]—are representative methods. It is difficult for this kind of method, which treats the BN as a graph structure that encodes the independent relationship between variables, to decide whether two variables are independent or conditionally independent, and the time to execute CI tests grows exponentially with the number of the variables.(2)Scoring and searching methods include the following two steps: model selection and model optimization. Model selection requires choosing a criterion, which is named as the scoring function. Currently, the following are some of the frequently used scoring functions: Bayesian information criterion (BIC) score [16], minimum description length (MDL) score [17], Bayesian Dirichlet (BD) score [18], implicit score (IS) [18], and mutual information test (MIT) score [19]. Modeling optimization aims to find the network structure that obtains the highest score according to the selection criterion. Usually heuristic searching algorithms are used, such as the bee colony algorithm [20], the genetic algorithm [21], the fish swarm algorithm [22], and the particle swarm optimization [23, 24]. The methods based on scoring and searching are intended to balance the accuracy, robustness, and sparsity. Nevertheless, these methods are easy to sink into the local optimal, which is considered to be their congenital weakness.(3)Mixed search methods use CI tests to reduce the graph search space and then obtain the optimal network structure through scoring and searching. Tsamardinos et al. [25] proposed the max-min hill-climbing (MMHC) algorithm, which is a typical mixture method. It sets up parent and children sets for each node to build the network skeleton using the MMPC algorithm [26] and then finds the optimal structure using hill-climbing algorithm based on the frame.(4)Among the few articles on optimal search, Ott and Miyano [27] proposed the first exact algorithm. While investigating the problem of exact model averaging, Koivisto and Sood [28] proposed another algorithm that also learned optimal graphs in a similar way. Singh and Moore [29] proposed a recursive implementation method that is less efficient in terms of calculation but has the advantage that potential branch-pruning rules can be applied. Silander and Myllymaki [30] provided a practically efficient implementation of the search and empirically demonstrated that optimal graphs could be learned up to n = 29. Yuan et al. [31] proposed an search algorithm for learning optimal BN. Malone et al. [32] proposed a memory-efficient implementation of the dynamic programming algorithm, which leveraged the layered structure of the dynamic programming graphs representing the recursive decomposition of the problem to reduce the memory requirements of the algorithm from to , where is the binomial coefficient. This kind of method can obtain the optimal network structure but has the following two limitations: (1) the scale of the learned network is limited to 30; (2) the efficiency of the algorithm is insufficient as most of the algorithms are based on depth-first search strategy.

In this study, we focused on how to enhance the efficiency of the exact structure learning algorithm.

3. Theoretical Basis of Bayesian Network

In this section, we briefly introduce the basics of BN and the concepts that are used to learn the structure of these networks.

3.1. Bayesian Network

Definition 1. (Bayesian network). A Bayesian network B(G, θ) is composed of a network structure G(V, E) and a set of network parameters θ = {θ1, θ2, …, θn}, in which the parameters θ are the set of conditional probabilities of joint distribution of network nodes based on structure G decomposition.

Definition 2. (structure of Bayesian network). The structure of BN is a DAG, whose nodes are variables X1, X2,…,Xn ∈ V, and for any Xi ∈ V, after given the set of all its parent nodes (or ), Xi is independent of all of its non-descendant nodes.
From this definition, we can see that the structure of BN is a DAG satisfying Markov condition, from which we can directly obtain a series of local independence relations in the node set V. According to the topological order and Markov condition, the joint distribution of Bayesian network nodes can be decomposed into the following:where denotes the conditional probability that quantifies the parent-child relationship in BN structure. Set to be the number of states of node Xi; then denotes the number of state combinations of the parent node set ; and forms a matrix with size , which is called the conditional probability table (CPT) and is recorded as θi. The set of CPT of all nodes θ= {θ1, θ2,…,θn} is called the parameters of BN. For brevity, we use the symbol θijk to represent conditional probability , where k ∈ {1,2, …,ri} is the state of Xi, and j ∈ {1,2, …,qi} denotes the combined state of the parent node set .

Definition 3. (topological order). Let G(V, E) be a BN structure and be a complete permutation of node set V. If holds for any edge (parent-child relationship) , we call a topological ordering (TO) of the Bayesian network.

Definition 4. (valid path). Let G(V, E) be a BN structure and be a path in G. Given the condition set Z ∈ V, the path is valid when the following conditions are satisfied:(i)For any V-structure on the path , a child of Xt or Xt belongs to Z.(ii)Other nodes on the path (including Xi and Xj) do not belong to Z.

Definition 5. (d-separation). Let X, Y, and Z be three disjoint subsets of the node set V of Bayesian network B(G, θ). If there is no effective path between any x ∈ X and y ∈ Y given Z, then X and Y are said to be d-partitioned, denoted as DsepG(X, Y|Z).

Theorem 1. (global Markov independence). Let X, Y, and Z be three disjoint subsets of node set V of Bayesian network B(G, θ). If X and are d-separated by Z, then given Z, X and Y are conditionally independent (i.e., X, Y|Z).

Corollary 1. In a BN, given the Markov boundary of any node X, that is, the set of parent node, child node, and spouse node (other parent nodes of child node), X is independent of all of the other nodes in the network.

From Theorem 1, it can be seen that the independence relation contained in the BN structure is a subset of the independence relation contained in the joint distribution which can be decomposed according to the BN structure.

3.2. Scoring Function

The scoring function is used to measure the fitting degree of the BN structure and data. The following is an introduction to commonly used scoring functions.

The CH score is the first Bayesian score function. For BN B and data sets D, the CH score is as follows:where Nijk denotes the frequency of the family state combination corresponding to the network parameter θijk in the data set D, and ; P(B) is the prior distribution of B; and the rest of (2) is the likelihood function.

The BD score, which introduces a reliable theoretical basis into the CH score, is a more commonly used:where is a gamma function over a real field and satisfies the property , and is the prior knowledge of the family state combination corresponding to the network parameter θijk. The larger the value, the more likely the family state combination, and .

When all , BD score degenerates to CH score. If the likelihood equivalence constraint is given, the likelihood equivalent BD score (BDe) is obtained. If we assume that the probability of all family state combinations is the same (i.e., ), we get the uniform distribution likelihood equivalent BD score (BDeu), but its structure learning result is very sensitive to the equivalent sample size .

The BIC score is another common scoring function, which can be regarded as the maximum likelihood function with penalty term; that is,where N denotes the sample size. The first term of the BIC score is the kernel of the maximum likelihood function, and the second term is the model complexity penalty.

3.3. Exact Search Based on Dynamic Programming

Exact search methods find the optimal solution of the problem in the global space. Thus, they can obtain the optimal network structure, which belongs to the equivalent class of the true network model. As this study mainly researched exact search methods based on the dynamic programming, we have introduced the basic theory of the dynamic programming in detail.

Dynamic programming methods traverse all the node orderings and obtain the global optimal solution. When learning BN structure by dynamic programming, each Bayesian network structure has at least one leaf node. In addition, the score criteria used are decomposable. Supposing the set of variables contained in the problem domain is V and the optimal BN structure has a leaf node X, the state transition equation of dynamic programming is as follows:

Formula (5) together with (6) linked a structure to its substructures. The optimal structure based on the remaining node set was recursively constructed through the previous process, until only one node was left. All of the structures in V construct a hash diagram, which showed the whole process of the dynamic programming. Because the hash diagram contained node ordering information of the network, the diagram was named as an order graph in the literature [28]. Another similar graph was named the parent graph, which contained a candidate parent set for each node. Figure 1 shows the presentation of the order graph and the parent graph.

Figure 1(a) is the order graph of four nodes. The order graph starts from the full set V, and each layer in the graph represents a state of dynamic programming. The transition from one state to another is a programming element, and one variable is excluded in each programming element until all of the variables are eliminated. Each node ordering of the BN network corresponds to the opposite direction of the path in the graph. Figure 1(b) is the parent graph of node , and each node in the graph is a candidate parent set of node X1, which stores the corresponding optimal parent set and the family score, that is, and .

4. Dynamic Programming Algorithm Based on Depth-First Search Strategy

4.1. Search Strategy

The DFSDP strategy started from the bottom of the order graph and searched the paths that connect a full node set and the empty node set from bottom to top. The transition from state S to other states corresponded to the selection of the leaf nodes for node set S in the network. Thus, a path from {X1, X2,…,Xn} to {} corresponded to the removal sequence of the leaf nodes, which was a reverse network node ordering. If the value of score (S) could be obtained in a certain path, it was stored. In this way, duplicate computation was avoided as the subsequent paths also may have arrived at the node set S.

The core idea of the depth-first search is as follow: Consider that all of the vertices in the graph were not accessed in the initial state. We started from a certain vertex and accessed the node first. The graph was searched from each of its unaccounted adjacent points in turn, until all of the vertices in the graph that had paths to were accessed. If other vertices were not visited, then we selected another unvisited vertex as the starting point and repeated this process, until all of the vertices in the graph were accessed. Based on the depth-first search strategy, the dotted line in Figure 2 shows an example of one of the search strategies. The red lines in Figure 2 represent the algorithm’s forward search, and the green lines represent the backtracking operation. When the access order of the node is , the search strategy is as follows.

Figure 3, from left to right, shows the search strategy for the order graph of four nodes when taking X1 as the leaf node under the depth-first search principle. The red lines in Figure 3 represent the algorithm’s forward search, and the green lines represent the backtracking operation. For other nodes as leaf nodes or other parent graphs, the search order is similar. The order of access to nodes by this search strategy is as follows.

4.2. Problems and Solutions

The DFSDP algorithm can find the optimal network structure globally. Because the algorithm adopts the depth-first strategy, the backtracking operation was executed repeatedly during the process of iteration, which made the efficiency low. There are two ways to solve this problem: (1) restricting the size of the candidate parent set of each node by constraints; (2) using the breadth-first search strategy instead of the depth-first search strategy. These two improvements are described in the following sections.

4.2.1. Research on Dynamic Programming Algorithms Based on Dependency Constraints

(1) Dependency constraints. Computing node family scores and finding the optimal parent set under the guidance of prior constraints can reduce temporal and spatial cost effectively. We named the method CBDP (dynamic programming based on constraints).

Theorem 2. Given sample data set D on a set of variables X= {X1, X2,…,Xn}, if conditional independence holds, the statistic approximately obeys the distribution of degree of freedom .

In Theorem 2, , denotes the number of samples in D, that is, , and denotes the number of values of variable Xi.

Definition 6. (Dependency coefficient). Given sample data set D on a set of variables X = {X1, X2,…,Xn}, is defined as the dependency coefficient between variables Xi and Xj. denotes the statistic of variable Xi and Xj under the condition of a given variable Xk. denotes the value of distribution with significance level and degree of freedom (ri − 1) (rj − 1)rk.
According to the hypothesis test, if , then Xi and Xj are interdependent whether or not additional nodes are added. If , then Xi and Xj may be independent given some variable Xk ∈ X.

Lemma 1. Given sample data set D on a set of variables X = {X1, X2,…,Xn}, variables Xi and Xj are locally conditionally independent at the significance level if and only if there exists a variable Xk ∈ X where holds.

Lemma 1 can be obtained directly by Definition 6 and by the hypothesis test. According to Lemma 1, variables Xi and Xj are locally conditionally independent at the significance level if and only if . Variables Xi and Xj are globally conditionally independent at the significance level if and only if , holds.

Matrix C = [cij] is defined as the dependent coefficient matrix of variable set X, where

We took the dependent coefficient matrix C as the prior constraints and restricted the size of the candidate parent set of each node in the network. Then, the number of family score calculations could be reduced effectively, and the efficiency of the algorithm was improved as a result.

(2) Search strategy.

Theorem 3. Assume that C is the dependent coefficient matrix of the network. CPa(X) refers to the set where variables are interdependent with X in C. CNPa(X) infers the set where variables are independent with X in C. U refers to the candidate parent set of X. PSbest (X) refers to the optimal parents set of X. Then, the following relationships hold:(1), if cij≠0, then Xi ∈ CPa(Xj); if cij = 0, then Xi ∈ CNPa(Xj).(2), , .

Theorem 3 can be obtained by the definition of the dependency coefficient. According to Theorem 3, the formula for calculating family scores can be rewritten fromto

This transformation can reduce the total score calculation times from n2n− 1 to .

Supposing that c2,1 > 0, c3,1 > 0, and c4,1 = 0 hold in the dependent coefficient matrix—that is, X2 ∈ CPa(X1), X3 ∈ CPa(X1), and X4 ∈ CPa(X1), which are shown in Figure 4(a)—the search strategy of parent graph of X1 is shown in Figure 4(b).

In general, the total number of family scores needed to be calculated in standard DP is n2n 1 and 2n 1 for each node. When considering the dependency constraints, the total number of family scores needed to be calculated in DP was and for each node. |CPa(X)| was far less than 2n 1 in general. Thus, the efficiency of the algorithm after adding dependency was enhanced greatly, as calculating the score was one of the most time-consuming parts. We next compared the candidate parent set (CPS) and number of family scores (NFS) of the Asia network with those needed to be considered with and without the dependency constraints in DP.

From Table 1, we also can find that the candidate parent set obtained by the dependency constraints was not exactly the same as that of the standard BN, which meant that the dependency constraints mined from the sample data were not always accurate.

4.2.2. Research on Dynamic Programming Algorithms Based on Breadth-First Search Strategy

(1) Search strategy. As mentioned earlier, the CBDP method restricts the size of candidate parent sets of each node by constraints, which then reduces the time for calculating the family score and improves the efficiency of the algorithm. However, the dependency constraint is mined from sample data, whose accuracy cannot be guaranteed, which may reduce the accuracy of the algorithm to a certain extent. As a result, the optimal network structure cannot be obtained. To address this problem, we proposed a breadth-first search strategy to replace the depth-first search strategy to avoid the backtracking operation. In this way, the search efficiency could be improved while guaranteeing the global optimum.

The core idea of the breadth-first search is as follows: We started from a vertex in the graph and visited the adjacent points of once after visiting . Then, we accessed the adjacent points of these points in turn and followed the principle that “the adjacent points of the first visited vertex take precedence over those of the second visited vertex,” until all of the adjacent vertices of the visited vertices in the graph were accessed. If other vertices were not visited, we selected another unvisited vertex as the starting point and repeated this process, until all of the vertices in the graph were accessed. Figure 5 shows an example search strategy when the access order of the node is .

Figure 6, from left to right, shows the search strategy for the order graph of four nodes when taking X1 as the leaf node under the breadth-first search principle. The red lines in Figure 6 represent the algorithm’s forward search. For other nodes as leaf nodes or other parent graphs, the search order is similar. The order of access to the nodes by this search strategy is as follows.

(2) Search method. From the analysis in the preceding section, we found that, compared with the depth-first search strategy, the breadth-first search strategy was more efficient because we did not have to execute the backtracking operation. Therefore, we proposed the BFSDP algorithm. The execution steps of the BFSDP algorithm are shown in Algorithm 1.

(1)Obtain family scores of all nodes and store them.
(2)Obtain the best parent set and the corresponding score of each and store them.
(3)Obtain the optimal network structure score and the optimal leaf nodes of each node combination and store them.
(4)Construct the network structure.

The algorithm may repeatedly query the family score and the optimal network structure score of each combination during the execution process. Therefore, we constructed a hash table corresponding to the node set and its label to improve the query efficiency. Representing node sets in binary encoding, for network with n nodes, a binary array b with n digits was set. For a set U in the node sequence diagram, if Xi ∈ U, then position i of b was set to 1; otherwise, it was set to 0. We then converted it to decimal labels by hash function . In scoring lookup, we used the decimal label instead of the node set. Taking a network of four nodes as an example, the hash table is shown in Table 2.Step 1: Calculate the whole family score of each node in the network and store them in the hash table, as follows:Step 2: For each node in the network, obtain the best parent set and the corresponding score and store them in the hash table according to the breadth-first search strategy in their parent graph, as follows:Step 3: Obtain the optimal network structure score and the optimal leaf node of each node combination and store them in the hash table according to the breadth-first search strategy in order graph, as follows:Step 4: Starting from the full node combination, extract the optimal leaf node and the optimal parent node set of the corresponding leaf node to construct part of the network structure. Update the current node set and repeat the process until the node set is empty. The pseudocode of Step 4 in the BFSDP algorithm is given in Algorithm 2.

Input: The best parent sets of all nodes (), number of nodes (n), and best leaf node of each combination of nodes ().
Output: The optimal network structure (G).
Set , , , ;
For m=n to 1 in steps of −1 do
X = index of nodes in Hash Table, calculated by ;
Y = Leaf (X);
 The m-th position in order is set to be Y;
 The Y-th position in nodes is set to be 0;
 Set raw to be the best parent sets of Y, which is ;
G (raw, Y)  1;
End for
Return G

5. Experiments

5.1. Experimental Setup

When sample data are generated by sampling from benchmark networks, each network generated 10 sets of sample data with fixed sample size and executed the algorithm once based on each set of sample data. Therefore, the experimental results listed in this section were the average of 10 experiments. All of the experimental environments in this section were Windows 10, Inter® CoreTM i5-6500 CPU @3.20 GHz, RAM 4.00 GB, using the MATLAB R2014a software platform. The setting of experimental parameters is given in Table 3.

5.2. Experimental Results

The experimental results of the time-consuming comparison of the three algorithms under standard network data sets are shown in Table 4 and Figure 7. The corresponding experimental results of the accuracy comparison are shown in Table 5. The experimental results of the time-consuming comparison of the three algorithms under UCI standard data sets are shown in Table 6. The corresponding experimental results of the accuracy comparison are shown in Table 7. “OT” in Tables 4, 6, and 7 indicates that the execution time of the algorithm exceeded the upper limit. We set the time limit in this study to 3 days. “OM” indicates that the storage space required by the algorithm exceeded the memory of the computer. “Nodes” denotes the number of nodes in the network, “Edges” denotes the number of edges of the network, and “Tns” denotes the total number of states for all of the nodes. The confidence level of the CBDP algorithm was set to be 0.05 when computing dependency constraints [33]. The DFSDP algorithm and the CBDP algorithm both exceeded the time limit on all of the sample data sets when learning the Child network. All of three algorithms exceeded the memory limit on all of the sample data sets when learning the Insurance network. Thus, we did not compare the learning accuracy of these two networks in Table 5.

Based on the execution time overhead of DFSDP algorithm, the ratio of execution time of the CBDP algorithm and the BFSDP algorithm overhead to baseline is shown in Figure 7.

According to the time-consuming comparison results, because the DFSDP algorithm adopted the depth-first search strategy, which required a repetitive backtracking operation, the execution efficiency of this algorithm was the slowest among all of the data sets. Compared with the DFSDP algorithm, the CBDP algorithm constructed a parent graph according to dependency constraints, but the algorithm was still executed under the depth-first strategy, and the efficiency was enhanced significantly. The CBDP algorithm could reduce the running time by −3.70% to 84.74% (57.10% in average) compared with the DFSDP algorithm. The BFSDP algorithm adopted a breadth-first search strategy, and no backtracking operation was required. Its efficiency was higher than that of the DFSDP algorithm on all of the data sets. The BFSDP can reduce the running time by 9.97% to 98.57% (50.02% in average) compared with DFSDP. The average efficiency improvement percentage of the CBDP algorithm was higher than that of the BFSDP algorithm because in small-scale network learning, the efficiency improvement of the CBDP algorithm was more significant. When the scale of the network was small (nodes less than 6), the efficiency of the CBDP algorithm was higher than that of the BFSDP algorithm. If the scale of the network becomes larger, the BFSDP algorithm will be more efficient.

When the scale of the network was small, the difference in efficiency between the depth-first search strategy and breadth-first search strategy was not obvious, and the effect of constraints on the efficiency of the algorithm was more significant. As the scale of the network grew, improvements in the efficiency of the algorithm made by the search strategy were more significant than those made by the constraints. When the scale of the network structure was further enlarged (larger than 20), only the BFSDP algorithm could find the optimal network within the time limit, whereas the DFSDP algorithm and CBDP algorithm both exceeded the time limit defined in this study. When the number of nodes in the network was greater than 26, the storage space required by all three algorithms exceeded the computer memory.

According to the accuracy comparison results, the DFSDP algorithm and the BFSDP algorithm always found the optimal network structure on all of the data sets and under different sample sizes. When the scale of the network was small (nodes less than 6) and the sample size was sufficient (size more than 1000), the CBDP algorithm also found the optimal network structure. When the scale of the network grew larger, the CBDP algorithm was less accurate than the other two algorithms. The score of the structure obtained by CBDP algorithm reduced from 0.91% to 46.77% (17.14% in average). As the CBDP algorithm pruned the search process by dependency constraints, even though the accuracy of the dependency constraints may not have been guaranteed, the larger the scale of the network, the higher the accuracy of constraints required (which was also true when a larger sample size was required). Thus, when learning large-scale networks, the accuracy of the CBDP algorithm decreased.

The conclusions of the analysis based on the time-consuming comparison and the accuracy comparison results under the UCI data sets were similar to those obtained under standard network data sets. In terms of efficiency, the CBDP algorithm reduced the running time by -2.02% to 81.11% (40.71% in average) compared with the DFSDP algorithm. The BFSDP algorithm reduced the running time by 54.26% to 99.93% (81.78% in average) compared with the DFSDP algorithm. In terms of accuracy, the score of the structure obtained by the CBDP algorithm was reduced by 7.46% to 101.74% (60.58% in average). Note that when the sample size was small relative to the scale of the network structure, the accuracy of the CBDP algorithm was usually poor. The main reason was that the sample size was insufficient to support the accuracy of the dependency constraints on the scale of the network under this circumstance. In addition, although some data sets had fewer than 27 nodes, as the number of value states of each node was large, this led to an increase in the number of family scores needed to be stored, and the memory still exceeded the limit.

6. Conclusions and Future Work

To address the problem that the traditional dynamic programming methods based on depth-first search strategy are inefficient, we proposed to prune the process of calculating redundancy scores by dependency constraints. Because it was difficult to guarantee the accuracy of the constraints, this led to a decrease in the accuracy of the method. Then, we proposed a breadth-first based strategy, which enhanced the efficiency significantly while also ensuring the global optimality. The experiments comparing the three algorithms verified the validity of the proposed CBDP algorithm and the BFSDP algorithm.

In this study, priors were integrated into the construction of parent graph, and the parent graph was pruned with priors, which enhanced the search efficiency. The reduction in the space complexity of the algorithm, however, was insufficient. Even with the addition of priors, it was still impossible to learn the large-scale network structures. Although the space complexity of the algorithm could be effectively reduced by pruning the order graph with priors, it would be ideal to learn the network structure with more nodes. Future work will focus on extending the learning scale based on prior constraints.

Data Availability

The data used in the experiments are 14 datasets in the UCI database and sampled from the 8 standard networks.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to Prof. Wang and Dr. Guo for helpful discussions. This work was supported by the National Key Laboratory Fund (CEMEE2020Z0202B) and the Shaanxi Science Foundation (2020JQ-816 and 20JK0680).