Abstract
Dynamic programming is difficult to apply to largescale Bayesian network structure learning. In view of this, this article proposes a BN structure learning algorithm based on dynamic programming, which integrates improved MMPC (maximumminimum parents and children) and MWST (maximum weight spanning tree). First, we use the maximum weight spanning tree to obtain the maximum number of parent nodes of the network node. Second, the MMPC algorithm is improved by the symmetric relationship to reduce falsepositive nodes and obtain the set of candidate parentchild nodes. Finally, with the maximum number of parent nodes and the set of candidate parent nodes as constraints, we prune the parent graph of dynamic programming to reduce the number of scoring calculations and the complexity of the algorithm. Experiments have proved that when an appropriate significance level is selected, the MMPCDP algorithm can greatly reduce the number of scoring calculations and running time while ensuring its accuracy.
1. Introduction
In recent years, big data, machine learning, and deep learning have become hot spots of common concern in academia and industry, such as computer science, medicine, statistics, economy, and social sciences [1–5]. However, artificial intelligence scholars proposed at the annual conference of artificial intelligence that future artificial intelligence research should be oriented to uncertain environments and oriented to humanlike mechanisms, and focus on dealing with complex problems and limited data learning problems. The “black box” design of the deep neural network model makes it difficult to explain its internal operating mechanism, and training the network requires a large number of labeled samples. It is not the only way to achieve future intelligence [6]. The methods that can deal with uncertainty and combine domain knowledge to model complex problems, such as fuzzy neural network [7, 8], Bayesian network [9], and DS evidence theory [10, 11], have attracted people’s attention again.
Among them, the Bayesian network has many advantages that other modeling methods do not have. With its rigorous mathematical foundation, graphical topology that is easy to understand intuitively, and the natural expression of real problems, it has become the powerful tool for uncertain information processing and posterior probabilistic reasoning, which has been widely used in genetic analysis [12], medical diagnosis [13], reliability analysis [14], and threat assessment [15].
2. Related Work
When BN is used to deal with real problems, the structure and parameters of the model need to be given as inputs. The task of learning Bayesian network parameters is performed with the model structure obtained in advance, so it is crucial to learn the correct BN structure from the data. However, BN structure learning (BNSL) has been proved to be a nondeterministic polynomial hard problem. For a BN network with n nodes, the number of possible structures is [16]. In DAG search space, BNSL can be divided into approximate learning algorithm and accurate learning algorithm according to learning accuracy. The former uses heuristic algorithms (ant colony algorithm [17], genetic algorithm [18], particle swarm optimization [19], etc.) for structure search, which is easy to fall into local optimum, and the structure finally obtained by learning is an approximate structure. The latter can guarantee the optimal network through the following methods: integer linear programming (ILP) [20], algorithm [21], and dynamic programming (DP) [22]. The ILP algorithm decomposes BN structure learning into the problem of linear programming and then uses the SCIP solver to solve it. The core problem is how to choose cutting planes. algorithm expresses BNSL as the shortest path programming problem [23–25], which uses the h value of nodes as the basis of path expansion to reduce the search space and the memory consumption of the algorithm. The DP algorithm decomposes BNSL into a problem of finding the best subnetworks, which recursively traverse the order space and parent space. It has been proved to be applicable to the network with no more than 26 nodes [26].
Dynamic programming algorithm is effective in small and mediumscale BNSL, but it still has considerable complexity. To solve this problem, scholars at home and abroad have performed relevant research. Silander [26] proposed the SMDP algorithm, which reduces its running time by constructing Pcaches storage structure and expands the computable network scale, but local scores are still required to be calculated for times. Malone [27] proposed the memoryefficient dynamic programming algorithm (MEDP), which uses the properties of scoring functions to constrain the maximum number of parent nodes of a node and then adopts layerbylayer storage calculation for dynamic programming, thus improving its efficiency by reducing a small number of scoring times. Singh [28] proposed an algorithm called dynamic programming based on depthfirst search (DFSDP), which performs pruning operations to reduce storage space easily, but it has low computational efficiency. Ye [29] proposed a dynamic programming structure learning method integrating prior information. This method introduces edge constraints and path constraints to prune the order graph, thus reducing its time consumption. But it has low fault tolerance because its accuracy depends on the correctness of prior information. Tan et al. [30] proposed an algorithm of dynamic programming constrained with Markov blanket (DPCMB), which uses the Markov blanket calculated by the IAMB algorithm to constrain the scoring calculation process to reduce the number of scoring calculations. However, the significance level will affect its accuracy and efficient. Combining the hierarchical idea and dynamic programming, Yang [31] proposed a hierarchical optimal BN structure learning algorithm, which uses conditional independence to layer nodes and considers that the parent node of each node only comes from its node layer and the layer with higher priority. Behjati and Beigy [32] proposed to use the strongly connected graph to obtain the node block order, then use the DP algorithm to obtain the complete node order, and finally obtain the BN structure through the K2 algorithm. Xu et al. [33] proposed a Bayesian network structure learning algorithm based on full permutation and extensible orderingbased search, which uses the local learning and pruning method to process the dataset with small and large number of variables, respectively, and obtains the candidate parent sets of each variable efficiently. Wang et al. [34] proposed a novel approach to improve the capability of a local search by determining the search direction, which can accelerate the convergence of the local search and acquire the higher quality structure. Gao et al. [35] proposed an improved Kmeans block learning Bayesian network structure algorithm, which uses the Kmeans algorithm fused with mutual information to block the network. Then, it adopts DP to learn the structure of each block and finally uses combination rules to obtain the best network. Yuan et al. [36] proposed to apply an improved algorithm in the dynamic programming space to reduce the complexity of space and time; Liu [37] proposed a new optimal path selection based on a hybrid improved algorithm and reinforcement learning method and obtained stable and efficient application effects in the optimal path selection of intelligent driving vehicles; Tan et al. [38] proposed a bidirectional heuristic search algorithm (BiHS) based on oneway heuristic search, and the results showed the BiHS algorithm is more efficient than the oneway heuristic search algorithm. Wang et al. [39] proposed the ancestor constrained ordered graph (ACOG), which improves the efficiency of precise search by introducing ancestor constraints.
To sum up, the existing literature has improved the DP algorithm by adding constraints or adopting spatial search strategies to reduce the time consumption to varying degrees. However, how to keep a balance between time consumption and algorithm accuracy is still a problem to be studied. In this article, the dynamic programming structure learning algorithm of Bayesian network integrating MWST and improved MMPC is proposed. By analyzing the connection relationship of nodes in maximum weight spanning tree, we take the maximum number of node connections in the tree structure as the maximum parent number t of nodes and prune the parent graph of nodes in the DP algorithm horizontally. Consequently, the upper limit of the parent number of nodes is reduced from n1 to t. Then, we take the candidate parentchild node set CPC obtained by the MMPC algorithm as the candidate parent set PS of the corresponding node and prune the parent graph of nodes in the DP algorithm longitudinally. Meanwhile, to avoid the falsepositive nodes in CPC (candidate parents and children), the MMPC algorithm is improved by the symmetry principle. With the above two operations, the time consumption and complexity of the algorithm can be lowered because the local scoring calculation times are reduced.
3. Principle of Dynamic Programming
From a mathematical point of view, the BN is a graph model representing the probability relationship among different random variables , which consists of structure G and distribution parameter θ. Structure is a directed acyclic graph (DAG). Random variable V is regarded as a node set. Directed edge set shows the dependency relationship between X_{i} and X_{j}: X_{i} is the parent set of X_{j}, that is, ; X_{j} is the subset of X_{i}, that is, . Parameter is a set of conditional probability distributions, and represents conditional probability distributions of X_{i} when pa(X_{i}), the parent set of X_{i}, is given. Based on the conditional independence hypothesis, the Bayesian network decomposes the joint probability distribution and decomposes intowhere n is the number of nodes and X_{i} is the ith node, which can take discrete and continuous values.
The BN structure learning algorithm based on dynamic programming is developed based on the following idea. If dataset is given, find a structure in the search space :where G_{n} represents all possible structures of the variable set in DAG search space. score(·) is a decomposable scoring function. In this article, BIC scoring function is adopted, as shown in the following equation:where q_{i} indicates the number of all possible parent node sets pa(X_{i}) of node X_{i}, and N_{ijk} is the number of samples when the kth value of node X_{i} and the jth value of its parent node are taken. In , the second term is a penalty term added to avoid overfitting.
Dynamic programming algorithm recursively traverses DAG space to search the optimal BN structure. Since the structure is acyclic, at least one node of the final structure has no output arc, which is called the leaf node. Assume that the optimal BN structure has a leaf node X and the state transition equation of dynamic programming is
The order graph composed of subsets in V and the parent graph of each node can represent the whole process of dynamic programming. Figure 1 shows the order graph with the number of nodes n = 4 and the parent graph of X_{1}. When the DP algorithm calculates from top to bottom, first determine the root node, and then gradually add leaf nodes until the remaining nodes are full sets. Conversely, when the DP algorithm calculates from bottom to top, first determine leaf nodes, and then gradually add root nodes until the remaining nodes are empty sets. In Figure 1, bolded lines indicate that the node order is .
(a)
(b)
It can be seen from Figure 1 that the BNSL algorithm based on dynamic programming needs to calculate and store all the local scores in advance. For the BN with n nodes, any node X_{i} has 2^{n−1} possible parent sets, requiring n2^{n−1} times of local score calculation in total. Therefore, the time complexity and space complexity of the dynamic programming algorithm are both o(n2^{n−1}).
4. Algorithm Idea
The main idea of the proposed algorithm is to use the maximum number of parents obtained with the maximum weight spanning tree and the CPC of the node obtained with the MMPC algorithm as the candidate parent set PS of the node to lay double constraints on the scoring process based on the dynamic programming algorithm. It further reduces the number of local scoring calculations and the time consumption of the algorithm. In addition, we also propose the SMMPC algorithm, which uses the symmetry principle to reduce the occurrence of falsepositive nodes in CPC. Before explaining the detail of this algorithm, the relevant concepts are given first.
Definition 1 (mutual information [40]). The mutual information between two random variables X_{i}, X_{j} can be calculated by the following equations:where r_{i} is the number of node states and the value range of X_{i} is . If , it represents that an undirected edge directly exists between variable X_{i} and X_{j}. The larger the is, the higher the dependence degree between the two variables. If , it represents variable X_{i} and variable X_{j} are independent of each other. Equation (7) shows that the mutual information matrix is a symmetric matrix whose diagonal is 0.
Theorem 1 (dseparate and conditional independence). In BN, if all paths between node X_{i} and node X_{j} are blocked by node set , then we can say set dseparates X_{i} and X_{j}, or X_{i} and X_{j} are independent of each other under the given . It is expressed as or .
Definition 2 (). It indicates the association degree of X and T under the given S. Its value is determined by G^{2} statistics under the null hypothesis of conditional independence. Assume that is the number of times when X = a, T = b, and S = c in the sample data. Then, G^{2} statistical variable is defined asGiven a significance level , if the value calculated by the G^{2} test is less than , the hypothesis is denied; that is, variable X and variable T are conditionally dependent. Otherwise, they are conditionally independent.
Definition 3 (). If all subsets of a given condition set Z are condition sets, take the minimum value of the association degree of target variables X and T:
Theorem 2 (falsepositive node). In any subset of the candidate node set V \ X, if the conditional independence test of two nodes shows that they are conditional dependent, then the two nodes are adjacent. Otherwise, one node is called the falsepositive node of the other.
4.1. MWST Algorithm
The steps of the MWST algorithm [41] are as follows: first, start from the node set Y = { X_{i} }. Then, find node X_{j} from the set V \ Y, which has the largest mutual information with any node y in set Y and uses undirected edges to connect y and X_{j}. Repeat this operation until Y = V. At this point, the sum of mutual information between nodes is the largest. Figure 2 shows the structure of Asia network and its MWST structure when data amount N = 1000.
(a)
(b)
It can be seen that there are undirected edges in the tree structure. Therefore, it is impossible to determine the exact parent set and child set of one node. The weight of edges in the tree structure represents the dependency between two variables. Mapped in the BN structure, it shows the relationship between parent nodes and child nodes. In this case, the maximum number of node connections in the tree structure can be considered the maximum number of parent nodes in the network structure. In the Asia network, the maximum number of node connections is 4; that is, the maximum parent number of the network can be set to 4. Then, the candidate parent set of X_{6} can be regarded as{X_{3},X_{4},X_{7},X_{8}}. It is worth noting that if the number of nodes n is large, the tree structure will lose edges. Therefore, the maximum number of parent nodes of each node cannot be determined according to the tree structure. For example, if the maximum number of parent nodes of X_{3} is 2, and the lost edge happens to the edge between X_{3} and a certain node, it will affect the search for the optimal parent set of X_{3}. Then, the accuracy of the algorithm cannot be guaranteed.
Based on the above analysis, we use the maximum number of parent nodes obtained from the maximum weight spanning tree structure to prune the parent graph of the DP algorithm horizontally and reduce the upper limit of the parent number from n1 to t, thus decreasing local scoring times. Figure 3 is an example of horizontal pruning of the parent graph by using value t in the Weather network, where the gray sets do not need to be calculated. If t = 1, only two layers of scores need to be calculated. If t = 2, only three layers of scores need to be calculated. If and only if t = n, the constraints are invalid, and the local scoring calculation times of the proposed algorithm are the same as those of the DP algorithm.
In generality, when the network has n nodes, the parent graph of the DP algorithm has n layers. After constraints with value t, nt1 layers of the parent graph are pruned, and t+1 layers are remaining. In this case, the number of local scores to be calculated is . After constraints, the local score calculation equation of X is expressed as
4.2. SMMPC Algorithm
Before introducing the SMMPC algorithm proposed in this article, we would briefly explain the MMPC algorithm. The MMPC algorithm is divided into two stages: in the first stage, it uses the maxmin heuristic strategy to make variables enter the CPC set of target variable T. In the second stage, it removes the edges that should not enter the CPC in the first stage by the following methods: for target variable T, if there is a subset S and a variable in its CPC set that make T and X are conditionally independent with the given S, we can have Ind(T,XS). Then, it removes X from the CPC of T. But the parentchild node set of the target node T obtained by the MMPC algorithm may contain falsepositive nodes. As shown in Figure 4, to obtain the parentchild node set of the target node T, the nodes B and T are independent if the size of the condition set is 0. Then, delete node B from ADJ(T). However, node T and node C are conditionally independent under condition set {A,B}. Therefore, deleting node B too early will lead to the failure to test the conditional independence of T and C in subsequent conditional independence tests. The result might be that node C still exists in the parentchild node set of T after the MMPC processing.
To solve this problem, this article proposes the SMMPC algorithm, which uses the principle of symmetry to improve the MMPC algorithm; that is, two nodes must be in the set of parent and child nodes of each other; otherwise, they are falsepositive nodes and can be deleted. As shown in Figure 4, process node T with the MMPC algorithm and we can obtain the parentchild node set, CPC(T) = {A,C}. Call MMPC processing on C to get the parentchild node set, CPC(T) = {A,B}, but , which does not satisfy the symmetric relationship between nodes. Therefore, C is a falsepositive node of T, and C should be deleted from the parentchild node set of T. The pseudocode of the SMMPC algorithm, in which lines 16–20 adopt the symmetry principle to delete falsepositive nodes, is shown as follows:

After the constraint of the maximum weight spanning tree, the possible parent set is still V \ X. Besides, only PS ∈ V \ X and  PS  ≤ t are required for the candidate subset of X, which only reduces a small number of scoring calculations. Therefore, this article uses the SMMPC algorithm to constrain the candidate parent set of nodes. After constraint, the local score calculation equation of X is
According to equation (11), only the following should be met for each node: CPC(X) ≤ n2. It means that delete at least one element that does not belong to the candidate parentchild set from the candidate parent set of each node, and then only the local scores of elements in CPC(X) need to be calculated, thus reducing the score calculations. The number of score calculations at this time is n2^{CPC(X)}. Figure 5 is an example of the Weather network pruning parent graph with CPC(X_{1}). If CPC(X_{1}) = { X_{2},X_{3}}, it means the node sets containing X_{4} can be omitted from calculation. They are represented with the gray sets in Figure 5. By comparing Figures 3 and 5, it can be seen that in the Weather network, the DP algorithm needs to calculate local scores 32 times. After the constraint with t = 2, local scores need to be calculated 28 times. After the constraint with the SMMPC algorithm, local scores need to be calculated only 16 times.
4.3. MMPCDP Algorithm
The core idea of the MMPCDP algorithm is to use the maximum number of parent nodes obtained by the maximum weight spanning tree and the candidate parent set obtained by the MMPC algorithm to constrain the dynamic programming process, thereby reducing the search space and improving the efficiency of the algorithm. The block diagram of MMPCDP is shown in Figure 6, and the complete steps of the algorithm are as follows:
Step 1. Use formula (6) to calculate the mutual information matrix between nodes according to the data.
Step 2. Using the size of mutual information as the weight, use the prim algorithm to build the maximum weight spanning tree.
Step 3. Traverse the maximum weight spanning tree to get the maximum number of connections of the node, and use it as the maximum number of parent nodes of the node.
Step 4. Use formula (8) for conditional independence test, the MMPC algorithm improved by the symmetry principle is used to obtain the candidate parent set CPC of the node and use it as the candidate set PS of the node.
Step 5. Recursively traverse the node order space for dynamic programming according to formulas (4), (5), use the maximum number of parent nodes and the candidate set PS to constrain the planning process of dynamic programming, and calculate the structure score according to formula (3).
Step 6. The structure with the highest score is the final BN structure.
Pseudocodes of the MMPCDP algorithm are shown in Algorithm 2. In Algorithm 2, Function3 is used to calculate and store all the local scores that need to be calculated. Lines 1–3 are to obtain the maximum number of parent nodes of the network by using the tree structure. Line 4 calls Function1 of the SMMPC algorithm in Section 3.2 and adopts the G^{2} test to verify the conditional independence between nodes to get the candidate parent set of nodes. Lines 5–7 call Function4 to calculate the possible parent sets and corresponding scores for each node. In Function4, lines 1–3 are the possible parent set combinations generated by function nchoosek. They are stored in set pa_comb. Lines 5–13 use doublelayer circulation to get the corresponding scores of possible parent sets of all nodes. In line 5, value t is recycled to prune the parent graph horizontally. In line 6, the candidate parent set CPC is recycled to prune the parent graph longitudinally. Line 14 sorts the resulting keyvalue pairs [Score, cpccodei] by score in descending order, supporting the search of node order graph.
In this article, after biconstraints with the above method, the final local score calculation equation of X isIt can be seen that the proposed algorithm reduces the overall score calculation times from n2^{n−1} to .

4.4. Algorithm Time Complexity Analysis
The time complexity of the algorithm is measured by the number of executions of the basic operations required during the execution of the algorithm. Assuming that n is the number of network nodes and N is the number of samples, the time complexity of calculating the mutual information of a certain group of nodes is , a total of mutual information calculations are required, and the time complexity of calculating the mutual information of the entire network is .The time complexity of using mutual information to construct the maximum weight spanning tree structure is , the time complexity of searching the weight spanning tree to get the maximum parent number of nodes t is , and therefore, the time complexity of searching the tree to obtain the maximum parent number of nodes is ; the time complexity of dynamic programming constrained with t and CPC is ; therefore, the time complexity of the MMPCDP algorithm is .
5. Simulation Experiment
5.1. Experiment Conditions
To verify the effectiveness of the MMPCDP algorithm, selfconstructed networks with data amount N of 10,000 and different scales (https://pan.baidu.com/s/1FbrobC5j7y5bVlz1WutY4Q) are adopted to analyze its performance based on Bayesian toolbox FullBNT1.0.7 [42]. The software environment is Windows 10 64 bit operating system, Intel(R) Core TM) i73615QM [email protected] 2.30 GHz processor, and MATLAB R2014a. In order to avoid the randomness of data sampling, which leads to the great difference of t values and affects the algorithm accuracy, the experiment samples 10 groups of data each time, and each group of data runs independently for 10 times to take the average. It means the t values are averaged and rounded up, and the running time is averaged. The selfconstructed network parameters are shown in Table 1. n is the number of nodes, E is the number of edges, P indicates the parameter quantity, and represents the maximum indegree.
5.2. Performance of SMMPC Algorithm
According to Definition 2, significance level will affect the performance of the MMPC algorithm to a great extent. If decreases, the result will be stricter, and the obtained CPC(T) will be smaller. If increases, the result will be broader, and the obtained CPC(T) will be higher. In this article, sensitivity and specificity are used to evaluate the performance of CPC(T) of target variables obtained by algorithms SMMPC and MMPC, and to analyze the influence of different values of on their accuracy. Sensitivity is defined as the number of variables in the CPC set of a target variable, which are correctly identified by the algorithm and belong to the real parentchild node set of the target variable. Specificity is defined as the number of variables that do not belong to the CPC set of the target variable, and variables that belong to the target real parentchild node set correctly identified by the algorithm.
Assume that represents the number of variables in the parentchild node set of X_{i} in a standard network. and represent the parentchild node set of X_{i} obtained by the algorithm and that of X_{i} in the standard network, respectively. indicates the number of nodes that make true. The sensitivity and specificity are defined as follows:
The two statistics together provide relevant measures for evaluating the algorithm performance of SMMPC and MMPC, and either statistic alone is not sufficient. In this article, Euclidean distance is used as a combination measure to characterize the approximate degree between the sensitivity and specificity of the algorithm and the optimal sensitivity and specificity:
The smaller the distance is, the closer the output of the algorithm is to the true parentchild node set.
In this article, Asia network, Sachs network, Child network, and Insurance network are selected from BN repository to evaluate the performance of the SMMPC algorithm, and N = 10,000. Figure 7 shows the sensitivity, specificity, and Euclidean distance of SMMPC and MMPC under four different values of .
(a)
(b)
(c)
(d)
In Figure 7, it can be seen that in the above four standard networks of different scales, when the amount of data is constant, the Euclidean distance increases with the significance level . When the four networks have the same significance level , the Euclidean distance of the proposed SMMPC algorithm is smaller than that of the MMPC algorithm, which means the SMMPC algorithm has a better performance. It can ensure that the output parentchild set of nodes is close to the real one and reduce the error rate of longitudinal pruning of the parent graph by the SMMPC algorithm. The reason for the performance improvement is that the SMMPC algorithm can improve the specificity by deleting falsepositive nodes according to the symmetry principle with the same sensitivity as the MMPC algorithm, thus reducing Euclidean distance.
5.3. Performance of MMPCDP Algorithm
In order to avoid traversing equivalent structures, we evaluate the accuracy of the algorithm by the percentage of the error between the network structure score calculated by the algorithm and the original network structure score. The calculation equation iswhere G′ is the network structure learned by the proposed algorithm, G represents the original network structure, and function score uses BIC. Equation (15) shows that the smaller the scoring error percentage, the higher the accuracy of the algorithm.
5.3.1. Analysis of Transverse Pruning
Different networks have different generated spanning tree structures and t values. This section compares the size of the maximum indegree of the network and t obtained with MWST. Then, it judges whether the accuracy of the DP algorithm will be influenced by horizontally pruning the parent graph with MWST. The statistical results are shown in Figure 8.
It can be seen that by comparing the values of and t, we have . It means each parent graph after horizontal pruning with value t must contain the optimal parent set of the node, and the algorithm accuracy will not be affected. Therefore, the accuracy of the proposed algorithm depends on longitudinal pruning.
5.3.2. Analysis of Longitudinal Pruning
In this article, the CPC obtained by the SMMPC algorithm is used as the candidate parent set of nodes, and the parent graph is pruned longitudinally. However, according to Definition 2, it can be seen that significance level will affect the performance of the SMMPC algorithm to a great extent. A detailed analysis is shown in Table 2.
Therefore, this section takes a network with n = 14 as an example (Figure 9). Then, it analyzes the influence of different values of on the accuracy of the MMPCDP algorithm. Figure 10 shows the network structure obtained by the MMPCDP algorithm when .
It can be seen from Figure 11 that the running time of the MMPCDP algorithm increases rapidly with the significance level . Score error percentage decreases as tends to 1. It finally stabilizes and reaches 0. Therefore, accuracy and time are two indexes that cannot be achieved simultaneously in the algorithm. To strike a balance between the accuracy and time of the algorithm, Section 5.3.3 takes to measure the performance of the proposed algorithm.
(a)
(b)
5.3.3. Analysis of Time Consumption and Accuracy
Set , and apply the proposed algorithm, standard DP algorithm, SMDP algorithm, MEDP algorithm, and DPCMB algorithm to the network in Section 5.1. Here, the importance threshold ε of the DPCMB algorithm is set to 0.50. Then, compare the algorithms’ running time to reflect time consumption, and compare their scoring error percentages to show accuracy. The results are given in Table 3, where “—” indicates that the storage space required by the algorithm exceeds the memory of our computer. Figure 12 shows the time consumption ratio of SMDP, MEDP, DPCMB, and MMPCDP based on the time consumption of the DP algorithm.
For the part marked in bold, such as n = 13, 14, 16, and 21, the accuracy of the MMPCDP algorithm proposed in this article has become lower, and it has not reached the situation that the accuracy of the SMDP algorithm is consistent and close to 0, and then research separately, conduct experiments, and gradually increase the value of to make the accuracy approach the accuracy of the SMDP algorithm; the results of various indicators are shown in Table 4.
Compared with the other four algorithms, the MMPCDP algorithm not only ensures the accuracy, but also reduces the time consumption. In Figure 12, the average running time of the five algorithms is roughly ranked as follows: . With the increase of n, the effectiveness of the constraints becomes more and more obvious. As shown in Table 4, by increasing the importance threshold , the score error percentage of the MMPCDP algorithm is mostly increased to the same point as the score error percentage obtained by the SMDP algorithm; that is, when gradually approaches 1, the accuracy of the MMPCDP algorithm approaches the SMDP algorithm and eventually stabilizes.
5.4. Discussion
From the perspective of local scoring times, constructing Pcaches storage structure with SMDP algorithm still needs n2^{n−1} times. The MEDP algorithm uses MDL scoring function to reduce the maximum number of parent nodes from n1 to m. Its required local scoring times are . The sample size will affect the final structure accuracy. The DPCMB algorithm uses Markov blanket to constrain the calculation process, and the required number of local scoring times is . In this article, the proposed algorithm uses double constraints to reduce the total score calculation times from n2^{n−1} to . It can be seen that our algorithm reduces its time consumption and complexity by reducing the number of scoring times. The algorithm accuracy is subject to the significance level , but not affected by the horizontal pruning with t.
6. UAV Intelligent DecisionMaking Application Based on MMPCDP
6.1. Air Combat Scene Description
First, give the mission scenario: there are multiple enemy air targets invading, and our side dispatches manned aircraft and drones for air combat. Due to strong electromagnetic interference from the enemy, the communication between the manned aircraft and the drone is interrupted. In order to ensure the safety of the manned aircraft, we decided to withdraw the manned aircraft and let the unmanned aerial vehicle (UAV) conduct autonomous operations to complete the task of air combat. At this time, when a UAV faces multiple enemy targets, it has to choose one target to attack, so it needs to make an attack decision. The task scenario is shown in Figure 13:
Different UAVs are equipped with different sensors, so the situation information that can be obtained is different. This article assumes that UAV can acquire target information of the following: height (H), distance (Dis), velocity (V), angle (A), radiation (R), radar reflection area (RCS), track state (TS), warning information (WI), type (T), intention (I), capacity (C), importance (Im), and task point (TP), as well as virtual node threat value(TV) and decision variable(D). The following is the discretization of the above variables:
6.2. Intelligent DecisionMaking Structure Construction
This article uses a simulation platform to simulate the combat process to obtain data, and the data are processed according to the discretization method shown in Table 5. The portion of the processed data is shown in Table 6.
Then, use the MMPCDP algorithm proposed in this article to learn the Bayesian network structure, thereby constructing a structure for UAV intelligent decisionmaking. Figure 14 is the BN structure constructed by expert knowledge. The final structure is shown in Figure 15. As the method of dynamic programming is adopted, this structure is the global optimal solution of the data.
By comparing the two structures, it can be found that while retaining expert knowledge, Figure 15 excavates more variable information, which is more in line with the objective facts. For example, the parent node of the decision node increases the target intention and the threat value in addition to the threat value included in the expert knowledge; the parent node of the target threat value node, in addition to the target task orientation of expert knowledge, adds the t target intention and target ability. From this, we can see the practical feasibility of the algorithm in this article.
7. Conclusion
DP algorithm is an accurate algorithm for learning BN structure, but it has high complexity, which limits its application. Therefore, this article proposes the dynamic programming structure learning algorithm of Bayesian network integrating MWST and improved MMPC. This algorithm uses mutual information to build the MWST and takes the maximum number of node connections in the MWST structure as the maximum number t of parent nodes of the network. The CPC set obtained by the SMMPC algorithm is used as the candidate parent set PS of nodes. By pruning the parent graph horizontally and longitudinally, the parent sets that need to be visited are reduced, thus reducing local scoring calculation times, the memory of scores, and the running time of the algorithm. Compared with the SMDP algorithm, the MMPCDP algorithm uses constraints to reduce the time consumption of the DP algorithm, whereas the SMDP algorithm uses a new storage structure; compared with the MEDP algorithm and DPCMB algorithm, this article has the same improvement ideas as these two algorithms, both by reducing local scores to reduce the time consumption of the DP algorithm, and the MMPCDP algorithm requires the least number of local scores by theoretical analysis. And simulation experiments show that when an appropriate significance level is selected, the MMPCDP algorithm can reduce scoring times and running time while ensuring accuracy. If and only if t = CPC = n1, the constraints are invalid. In future work, we will study how to effectively adjust the value of effectively to meet the accuracy requirements.
Data Availability
Data used in the experiments are synthetically generated from the networks.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors are grateful to Dr. Zhigao Guo for helpful discussions. This work was supported by the National Key Laboratory Fund (CEMEE2020Z0202B), Equipment Development Fund (80902010302), and the Shanxi Science Foundation (2020JQ816 and 20JK0680).