Abstract

Aiming at the common problem of low learning effect in single structure learning of a Bayesian network, a new algorithm EF-BNSL integrating ensemble learning and frequent item mining is proposed. Firstly, the sample set is obtained by sampling the original dataset using Bootstrap, which is mined using the Apriori algorithm to derive the maximum frequent items and association rules so that the black and white list can be determined. Secondly, considering that there may be wrong edges in the black and white list, the black and white list is used as the penalty term of the BDeu score and the initial network is obtained from the hill climbing algorithm. Finally, repeat the above steps 10 times to obtain 10 initial networks. The 10 initial networks were integrated and learned by the integrated strategy function to obtain the final Bayesian network. Experiments were carried out on six standard networks to calculate score and . The results show that the EF-BNSL algorithm can effectively improve score, reduce , and learn the network structure that is closer to the real network.

1. Introduction

Bayesian network (BN), a probabilistic graphical model, was proposed by Pearl [1] in 1988 and is capable of effective inference and analysis of uncertain knowledge, which is one of the hot spots of research in machine learning. BN has a solid theory base and has been broadly applied in many industries, including transportation [2], industrial production [35], economy [68], medicine [9], and agriculture [10, 11]. It is obvious that the innovative research on BN theories and methodologies, as well as the scientific construction of more effective algorithms, will definitely promote the ability of problem solving in practical areas [12, 13].

The fundamental theory of BN mainly contains structure learning, parameter learning, and Bayesian inference. Parameter learning is to learn network parameters on the known structure of the network. Structure learning includes the learning of parameters and the learning of the network structure, which is the focus and difficulty of BN learning. The basis of parametric learning and Bayesian inference is BN structure learning. However, finding the optimal structure of the BN is a NP-hard problem [14]. The computation complexity grows exponentially as the number of nodes increases.

There are three main structure learning methods, which are constraint-based, fraction-based, and hybrid methods. Constraint-based methods usually use conditional independence tests or mutual information to identity dependency relationships between variables. Spirtes and Meek [15] proposed the SGS algorithm, the first structure learning algorithm, which determines the network structure mainly the conditional independence between nodes, but the learning efficiency grows exponentially. Spirtes et al. [16] improved the SGS algorithm by proposing the PC algorithm, which is enabled to construct BN from sparse networks and allows the use of chi-square tests without the necessity of a specific independence test. Qi et al. [17] proposed the WMIF algorithm to learn the network structure using the weakest-first strategy. The core concept of the score-based approach is to find the best structure based on the scoring function by traversing all possible structures. Cooper and Herskovits [18] proposed the K2 algorithm, but the algorithm required a prior upper limit on the ordering of the nodes and the number of parents per node. Lee and Beek [19] proposed a local greedy search method coupled with a perturbation factor and used the idea of a knockdown algorithm to improve the performance of a local greedy search by using a metaheuristic. The scoring functions commonly used are MDL [20], AIC [21], BIC [22], and BDeu [23]. The common search algorithms are the K2 algorithm [18], hill climbing algorithm [24], and genetic algorithm [25]. The essential idea of hybrid learning is to decrease the size of the search space by first testing for independence and then using a scoring search method to gain the most optimal network structure. The first hybrid learning algorithm was the CB algorithm proposed by Singh and Valtorta [26], which starts by attempting a constraint-based PC algorithm to identify the order of the nodes and then learns the structure using the K2 algorithm. Alonso-Barba et al. [27] proposed the I-ACO-B algorithm, which first reduces the complexity of the search space using independence tests, and then uses an ant colony algorithm for scoring search to obtain the optimal network structure.

Recently, Eggeling et al. [28] proposed the idea of introducing structural priors and Wang et al. [29] proposed the use of constraints, both of which can greatly decrease the search space of structure learning, which is an idea for improving the existing structure learning algorithms. Xiao et al. [30] proposed an algorithm to merge association rules and knowledge for network structure learning. Sun et al. [31] proposed a new hybrid approach by integrating the PC and PSO algorithm, which takes parts of the output of the PC algorithm as structure priors to improve the initial solutions to BN structure learning. Li et al. [32] proposed BN structure learning based on frequent term mining. Wang and Qin [33] proposed BN structure learning based on ensemble and feedback strategies, but only integrated multiple BNs are obtained through BDeu score. Based on this paper, a new BN structure learning algorithm named the EF-BNSL algorithm based on ensemble learning and frequent item mining is designed. The EF-BNSL algorithm flowchart is shown in Figure 1.

2. Materials and Methods

2.1. Bayesian Network

A BN can be shown by , where means the directed acyclic graph (DAG), represents the set of nodes, represents the set of directed edges, and is the probability distribution among the nodes, which is commonly represented by the conditional probability table (CPT).

The joint probability distribution of the node set can be represented aswhere is the parent node of and is a conditional probability. For node , we consider the relationship with , and the number of parameters is much smaller than computering the joint probability directly.

As shown in Figure 2, the nodes and have no parent node, so their probabilities are directly and . The parents of node are nodes and , so the probability of node is . Similarly, the parent node of the node is the node . The probability of node is . Finally, we can get the CPTs of all nodes.

The goal of Bayesian network structure learning is to obtain the most suitable network structure for the dataset. The scoring function is used to measure the fitness of the BN structure for the dataset. The search algorithm is a search strategy and can find the best BN structure when the scoring function is determined. In this paper, we chose the BDeu score [23]. The BDeu score is given as follows:where is the prior probability, is the number of nodes, is the number of parents of the node, is the number of values taken by the node, is the number of samples, which satisfy the node in the dataset, and , . Usually, we assume that , where is the given equivalent sample size.

After choosing the scoring function, the problem of structure learning is transformed into an optimization problem which we must find the structure with the highest score among the possible structures. In this paper, we chose the hill climbing search algorithm [24].

2.2. Analysis of Association Rules

Association rules can be applied to identify association relationships between variables. The rule form of the association rule is , which indicates the association rule of the variable with the variable, and is called the premise of the association rule, and is called the result of the association rule. The definition of association rules requires the introduction of support and confidence, which is defined as follows:where is the support of the association rule , which is the probability of and occurring simultaneously. is the confidence of the association rule , which is the conditional probability of occurring when occurs.

The basic idea of the Apriori algorithm is to calculate the support of the item set by scanning the dataset several times and finding all frequent item sets to generate association rules. The is said to be a superset of the if every element of the is in the . If all supersets of the frequent item set are nonfrequent item sets, the frequent item set is a maximal frequent item (MFI) set. The Apriori algorithm can mine the frequent item sets with support greater than the minimum support and find the MFI by filtering the association rules that satisfy to obtain the strongly associated rule set .

2.3. Ensemble Learning

Ensemble learning is a new machine learning method which is widely used for classification and regression tasks. Ensemble learning means using multiple identical or different learning algorithms to solve a unified problem by some combination [33].

The most classic ensemble learning is boosting and bagging. The boosting algorithm generates a training set and trains the model by sampling the original dataset with put-back, while the bagging algorithm generates multiple datasets by sampling them with a repeatable sampling technique (bootstrap) and trains the model on them separately and then combines the multiple models to obtain a more stable model. The primary aim of BN structure learning is to determine the directed edges between nodes, and bagging can effectively reduce the possible multilateral and antiedge problems, making the learning results more stable and reliable [34, 35].

3. EF-BNSL Algorithm

3.1. Building the Initial Network

Since the K2 algorithm is more sensitive, a certain incorrect result may mislead the construction of the whole BN, so this study uses the strong association rule set to correct the BN structure of the MFI set to increase its robustness.

The process is as follows: firstly, the variables in the MFI set are selected sequentially from the data and . Secondly, the BN structure is obtained by using the K2 algorithm for , respectively, . Lastly, if the node pair and , then the node pair is added to the . Otherwise, it is considered impossible to have a dependency and is added to the .

The pseudocode of the black and white list algorithm is listed in Algorithm 1.

(i)Input: Dataset , MFI set
    strong associated rule set
(ii)Output: white list , black list
(1)  For i in range(len()):
(2)    Data_node = data[]
(3)    //BN obtained using K2 algorithm
(4)    For edge in : //Loop BN of edges
(5)     If edge in ”: //Determine whether the edges of a BN are in
(6)      //Get a white list
(7)    //Aggregate the set of BN structures
(8)    If:
(9)     //Get black list

Since the learned edges in the black and white list may have errors, a penalty term can be set in order to give the model certain error tolerance. A new score function is obtained by incorporating the black and white list as a penalty term. Finally, the initial BN is obtained from the hill climbing search algorithm. The new score function is given as follows:where is the BDeu score, is the given weight, is the penalty term, is the number of elements of the node pair , is the number of elements of the node pair , is the number of samples of the node pair , and is the number of samples of the node pair .

3.2. Ensemble Learning

In this paper, we adopt the idea of ensemble learning and use an integrated strategy function for calculating the final score of each edge. The integrated strategy function is defined as follows:where is the number of samples, is the number of edges of the BN by combining the initial network using the training sample, is the number of nodes in the dataset, is the adjacency matrix of the BN obtained using the training sample, is the sample size, and is the Hadamard product.

The EF-BNSL algorithm flow is as follows: firstly, the original dataset is sampled 10 times using bootstrap, and each of the 10 sample sets is learned to obtain 10 initial BNs. The 10 initial BNs are denoted by the adjacency matrices as .

Secondly, the 10 adjacency matrices are integrated by the integration strategy function to obtain the score matrices for each edge. The score matrix is normalized by the maximum-minimum.

Thirdly, set the threshold , if the condition is satisfied then we can set , if the condition is satisfied then . We can get the adjacency matrix .

Lastly, iterate over the adjacency matrix , if then add the directed edge (p⟶q), and finally get the ensemble learning BN. The number of edges of the BN can be effectively controlled by setting the size of the threshold . If the threshold is set too small, there will be many of redundant edges, and if it is set too large, the situation of fewer edges will be very dreadful.

The pseudocode of the EF-BNSL algorithm is listed in Algorithm 2.

Input: Dataset , the scoring function , threshold
Output: Bayesian network
(1)For i in range (10): //Loop 10
(2)  //Bootstrap sampling
(3)  //Obtain the initial BN
(4)  //Represented by adjacency matrices
(5)  //Count the number of times each edge has been learned
(6)  //Assign weights to each edge
(7)  //Calculate the total weight value of each edge
(8)//Calculate the integrated strategy function
(9)//normalized by the maximum-minimum
(10)For p in range ():
(11)   For q in range ():
(12)    If://Control the number of edges by the threshold
(13)      //0 means that no edge exists
(14)    Else:
(15)      //1 denotes the existence of directed edges
(16)//The adjacency matrix is transformed into a BN

4. Experiments and Result Analysis

4.1. Experiment Preparation

In this paper, we use the python integrated environment Anaconda3 with python version 3.8.5 for programming. Six standard networks of different sizes were downloaded from the Bayesian Network Repository [36]. BN was loaded through the pyAgrum package and the corresponding size dataset was generated, frequent term mining and association rule analysis were performed through the mlxtend package, the sklearn package for Bootstrap sampling, and the pgmpy package for the BN structure learning.

Three networks of different types were chosen for the experiment, including the small size networks (up to 20 nodes), Asia and Sachs, the medium size networks (20–50 nodes), Alarm and Insurance, and the large size networks (50–100 nodes), and Hailfinder and Hepar2. The six standard networks are shown in Table 1.

4.2. Evaluation Indicators

To verify the effectiveness of the EF-BNSL algorithm, the score and the Hamming distance () are chosen to evaluate the generated BN and defined as follows:where is the number of edges in both the current network and the standard network, is the number of edges in the current network but not in the standard network, and is the number of edges that do not exist in the current network but exists in the standard network, is the recall rate, and is the precision rate.

The score is a combination of recall rate and precision rate. A larger F1 score means that the BN is closer to the real network. The is the sum of and . A smaller means a smaller number of erroneous edges and indicates that the learned BN is closer to the standard network.

4.3. Results

To verify the performance of the EF-BNSL algorithm, we conducted experiments on six networks. The following is an example of the Asia network. The Asia network is a small BN, which is used as a fictional medical example. The network asks whether a patient has tuberculosis, lung cancer, or bronchitis and consists of eight nodes and eight edges. Each random variable is discrete and can take two states. The original Asia standard network is shown in Figure 3.

The experimental process is divided into three stages. The first stage of the experiment is to perform frequent item mining and association rule analysis on the sample set. Firstly, we load the BIF file to generate a dataset with a sample size of 20000, and then we perform sampling by using bootstrap with a sample size of 1000. The sample set variables are transformed into Boolean variables. The minimum support and minimum confidence were set to 0.75 and 0.95 [32], respectively. The minimum support and minimum confidence can control the number of edges in the black and white list, which can be selected on the basis of the dataset distribution characteristics.

By using the Apriori algorithm with the given minimum support, the MFIs and association rules can be obtained. An association rule with a confidence level greater than the given minimum confidence is a strong association rule.

The MFI set is , and the strong association rule results are shown in Table 2.

On the basis of obtaining the MFI set and strong association rule results, the black and white list can be obtained from the proposed Algorithm 1 above. The black and white list of results are shown in Table 3.

In this experiment, the edges and in the white list are in the original Asia standard network, and the accuracy of the white list reaches 100%. On the other hand, all the edges in the black list are judged correctly except for the edge error, and the accuracy rate of the black list reaches 85.7%. To give the model some fault tolerance, so we do not use the list directly as the black and white list of BN. Instead, they are given certain weight scores and added to the scoring function as penalty terms. Also, we find that the number of edges in the white list is smaller than the number of edges in the black list in the experiment. Different weights can be further set for the edges of the black and white list in future studies.

The second stage of the experiment is to update the scoring function by the black and white list and get the initial network. Firstly, according to (3), the black and white list is added to the scoring function as a penalty term. The penalty term weight was set to 0.5 [32]. If an edge in the black list is contained in BN, the new scoring function becomes smaller. Conversely, if the edges in the white list are contained in BN, the new scoring function becomes larger. The network of the maximum score is obtained from the hill climbing search algorithm, which is the initial BN.

As shown in Figure 4(a), the BN is a network obtained by direct structure learning using BDeu score, which has 14 edges. Figure 4(b) shows the CPT of this BN structure. For example, the node has only node as its parent node, so there are four possible outcomes for the CPT of the node. , , , and . As shown in Figure 5(a), the BN is a network obtained by structure learning using an updated scoring function, with has 12 edges. Figure 5(b) shows the CPT of this BN structure.

By comparing with the original Asia standard network, we can able to calculate the score and for these two BNs using equations (7)–(10). The score and of the BN in Figure 4(a) is 0.45 and 9, respectively. The score and of the BN in Figure 5(a) is 0.5 and 7, respectively. Improved scoring functions for BN learning outperformed the BN obtained by direct learning using the BDeu score.

The third stage of the experiment is to use Algorithm 2 to obtain the final ensemble learning BN. Firstly, bootstrap was used to sample the dataset for 10 times, and the 10 sample sets were, respectively, learned to obtain 10 initial BNs.

Secondly, the 10 initial BNs were denoted by the adjacency matrices as , and then the 10 adjacency matrices were integrated by the integration strategy function to obtain the score matrix . Meanwhile, the score matrix was normalized by the maximum-minimum.

Thirdly, set the threshold [33], if the condition was satisfied, then we can set ; if the condition was satisfied then . We can get the adjacency matrix .

Lastly, iterate over the adjacency matrix . If the condition , we can add the directed edge in the network. The final ensemble learning BN was obtained.

The final ensemble learning BN is shown in Figure 6(a), which has 10 edges. Figure 6(b) shows the CPT of this BN structure. For example, the node has only node as its parent node, so there are four possible outcomes for the CPT of the node. , , , and . If we know that the value of node is 0, then we can deduce that the probability that node is equal to 0 is 0.9706. The score and of the final ensemble learning BN is 0.45 and 9, respectively. These two evaluation indicators show that the EF-BNSL algorithm is better than the BDeu score.

Because of the randomness of the data and the uncertainty of the search process, BN structure learning results can be varied. Therefore, to improve the reliability of the experimental results, the above experimental procedure was repeated 10 times, and the results obtained by calculating the and are shown in Table 4. Finally, the average of and was calculated for 10 times of results.

It is obvious from Table 4 that the BN obtained using the EF-BNSL algorithm is larger in score than the BN obtained directly using the BDeu score. The BN obtained using the EF-BNSL algorithm is smaller in than the BN obtained directly using the BDeu score. The EF-BNSL algorithm on the Asia network outperforms the algorithm that uses the BDeu scoring function directly and is closer to the original standard Asia network.

Through the previous experimental method for three different types of standard networks, including small networks (Asia and Sachs), medium networks (Alarm and Insurance), and large networks (Hailfinder and Hepar2), the comparison results of six standard networks are shown in Table 5.

From the experimental results, it can be seen that the EF-BNSL algorithm proposed in this paper achieves better learning performance than the algorithm that directly using the BDeu score. The EF-BNSL algorithm does not show a greater advantage when the samples are small. Because of the lack of information in small sample data, it is difficult to mine more information of association rules. When the sample size is larger, the EF-BNSL algorithm has a better learning effect. More association rules can be mined and the black and white list can be more accurate. By using the EF-BNSL algorithm, there is a significant performance improvement in Asia and Sachs for the small size networks, and no obvious progress in Alarm and Insurance networks. In the experiment, the number of edges in the white list is much smaller than the number of edges in the black list as the number of nodes increases, which leads to a much reduced scoring function. In future research, different weights can be further set for the edges in the black and white list. In addition, when there are more network nodes, the network search space increases exponentially. To balance the search time and memory usage, the experiment adjusted down the weight of the black and white list penalty terms for large networks, which also affects the experimental results.

5. Conclusion and Future Research

In this paper, we combined the ideas of frequent item mining and ensembled learning into BN structure learning and proposed the EF-BNSL algorithm. We conducted experiments on six BNs of three types such as small, medium, and large. We used two evaluation metrics, score and . The results show that the EF-BNSL algorithm can effectively improve score, reduce , and learn the network structure that is closer to the real network.

In the experiment, as the number of nodes increases, there are more edges in the black list than in the white list. In fact, it is possible that the edges in the white list would be more important, so assigning different weights to the black and white list will be the next highlight of future research. In addition, when there are more network nodes, the network search space increases exponentially. Future research could consider using distributed computing to improve efficiency. Ensemble learning of different network structure learning algorithms, and making full use of the advantages of different algorithms for BN structure learning will be the focus of the future research.

Data Availability

Six standard networks can be downloaded from the Bayesian Network Repository (https://www.bnlearn.com/bnrepository/).

Conflicts of Interest

The authors declare that have no conflicts of interest.

Authors’ Contributions

Guoxin Cao contributed to the conception or design of the work. Guoxin Cao and Haomin Zhang contributed to the acquisition, analysis, or interpretation of the data and drafted the manuscript. All authors reviewed the manuscript.

Acknowledgments

This project was supported by the National Natural Science Foundation of China (61763008, 71762008, and 62166015) and the Guangxi Science and Technology Planning Project (2018GXNSFAA294131 and 2018GXNSFAA050005).