Abstract
Bird swarm algorithm is one of the swarm intelligence algorithms proposed recently. However, the original bird swarm algorithm has some drawbacks, such as easy to fall into local optimum and slow convergence speed. To overcome these shortcomings, a dynamic multiswarm differential learning quantum bird swarm algorithm which combines three hybrid strategies was established. First, establishing a dynamic multiswarm bird swarm algorithm and the differential evolution strategy was adopted to enhance the randomness of the foraging behavior’s movement, which can make the bird swarm algorithm have a stronger global exploration capability. Next, quantum behavior was introduced into the bird swarm algorithm for more efficient search solution space. Then, the improved bird swarm algorithm is used to optimize the number of decision trees and the number of predictor variables on the random forest classification model. In the experiment, the 18 benchmark functions, 30 CEC2014 functions, and the 8 UCI datasets are tested to show that the improved algorithm and model are very competitive and outperform the other algorithms and models. Finally, the effective random forest classification model was applied to actual oil logging prediction. As the experimental results show, the three strategies can significantly boost the performance of the bird swarm algorithm and the proposed learning scheme can guarantee a more stable random forest classification model with higher accuracy and efficiency compared to others.
1. Introduction
The concept of swarm intelligence was first proposed by Hackwood and Beni in 1992 [1]. Swarm intelligence algorithms have been proved that it can solve nondifferentiable problems, NPhard problems, and difficult nonlinear problems which the traditional techniques cannot solve. For this reason, swarm intelligence algorithms are hotly researched in computer science and have been updated from generation to generation. For classic swarm intelligence algorithms, particle swarm optimization (PSO) [2] is used to define the basic principle and equations of the swarm intelligence algorithms. In recent years, many new swarm intelligence algorithms have been proposed, such as artificial bee colony (ABC) algorithm [3] which is inspired by the stock of food location behavior of bees. Artificial fish school algorithm (AFSA) [4] and firefly algorithm (FA) [5] are inspired by the foraging process of fish and firefly, and cat swarm optimization (CSO) [6] is developed based on vigilance and foraging behavior of cats in nature. According to the foraging behavior, vigilance behavior, and flight behavior of the bird swarms in nature, Meng et al. proposed a novel swarm intelligence algorithm called bird swarm algorithm (BSA) [7]. Meanwhile, due to these advantages above, swarm intelligence algorithms have been applied to optimize various fields, such as PSO for mutation testing problems [8], genetic algorithm (GA) for convolutional neural networks parameters [9], FA for convolutional neural network problems [10], and whale optimization algorithm (WOA) for cloud computing environments [11]. So, BSA which will be used in this paper has been widely applied to engineering optimization problems.
However, the original swarm intelligence algorithms have limitations in solving some practical problems. Hybrid strategy which is one of the main research directions to improve the performance of swarm intelligence algorithms has become a research hotspot in machine learning. Tuba and Bacanin [12] modified the exploitation process of the original seeker optimization algorithm (SOA) approach by hybridizing it with FA which overcame shortcomings and outperformed other algorithms. Strumberger et al. [13] also has proposed dynamic search tree growth algorithm (TGA) and hybridized elephant herding optimization (EHO) with ABC, and the simulation results have shown that the proposed approach was viable and effective. Yang [14] analyzed swarm intelligence algorithms by using differential evolution, dynamic systems, selforganization, and Markov chain framework. The discussions demonstrate that the hybrid algorithms have some advantages over traditional algorithms. Bacanin and Tuba [15] proposed a modified ABC based on GA, and the obtained results show that the hybrid ABC is able to provide competitive results and outperform other counterparts. Liu et al. [16] presented a multistrategy brain storm optimization (BSO) with dynamic parameter adjustment which is more competitive than other related algorithms. Peng et al. [17] has proposed FA with luciferase inhibition mechanism to improve the effectiveness of selection. The simulation results have shown that the proposed approach has the best performance in some complex functions. Peng et al. [18] also developed a hybrid approach, which is using the best neighborguided solution search strategy to search ABC algorithm. The experimental results indicate that the proposed ABC is very competitive and outperforms the other algorithms. It can be seen that the hybrid strategy is a strategy to successfully improve the swarm intelligence algorithm, so the BSA algorithm will be improved by hybrid strategy in this paper.
Similarly, BSA can also be applied to multiple fields, especially in the field of parameter estimation, and hybrid strategy is also the improvement method for BSA. In 2017, Xu et al. [19] proposed improved boundary BSA (IBBSA) for chaotic system optimization of the Lorenz system and the coupling motor system. However, the improved boundary learning strategy has randomness, which makes IBBSA generalization performance not high. Yang and Liu [20] introduced the dynamic weight into the foraging formula of BSA (IBSA) which provides a solution for problem that antisamefrequency interference of shipborne radar. The results have shown that the dynamic weight is just introduced into the foraging formula of BSA, but IBSA ignored the impact of population initialization. Wang et al. [21] designed a strategy named “disturbing the local optimum” for helping the original BSA converge to the global optimal solution faster and more stably. However, “disturbing the local optimum” also has randomness, which makes the generalization performance of improved BSA not very well.
Like many swarm intelligence algorithms, BSA is also faced with the problem of being trapped in local optima and slow convergence. These disadvantages limit the wider application of BSA. In this paper, a dynamic multiswarm differential learning quantum BSA called DMSDLQBSA is proposed, which introduced three hybrid strategies into the original BSA to improve its effectiveness. Motivated by the defect of insufficient generalization ability in the literature [19, 21], we will first establish a dynamic multiswarm bird swarm algorithm (DMSBSA) and merge the differential evolution operator into each subswarm of the DMSBSA, and it improves the local search capability and global search capability of foraging behavior. Second, according to the contempt for the impact of population initialization in the literature [20], they used quantum behavior to optimize the particle swarm optimization in order to obtain a good ability to jump out of global optimum; we will use the quantum system to initialize the search space of the bird. Consequently, it improves the convergence rate of the whole population and avoids BSA into a local optimum. In order to validate effectiveness of the proposed method, we have evaluated the performance of DMSDLQBSA on classical benchmark functions and CEC2014 functions including unimodal and multimodal functions in comparison with the stateoftheart methods and new popular algorithms. The experimental results have shown that the three improvement strategies are able to significantly boost the performance of BSA.
Based on the DMSDLQBSA, an effective hybrid random forest (RF) model for actual oil logging prediction is established, called DMSDLQBSARF approach. RF has the characteristics of being nonlinear and antiinterference [22]. In addition, it can decrease the possibility of overfitting which often occurs in actual logging. RF has been widely used in various classification problems, but it has not yet been applied to the field of actual logging. Parameter estimation is a prerequisite to accomplish the RF classification model. The two key parameters of RF are the number of decision trees and the number of predictor variables; the former is called , and the latter is called . Meanwhile, parameter estimation of the model is a complex optimization problem that traditional methods might fail to solve. Many works have proposed to use swarm intelligence algorithms to find the best parameters of the RF model. Ma and Fan [23] adopted AFSA and PSO to optimize the parameters of the RF. Hou et al. [24] used the DE to obtain an optimal set of initial parameters for RF. Liu et al. [25] compared genetic algorithms, simulated annealing, and hill climbing algorithms to optimize the parameters of the RF. From these papers, we can see that metaheuristic algorithm must be suitable for this problem. In this study, the DMSDLQBSA was used to optimize the two key parameters that can improve the accuracy without overfitting for RF. When investigating the performance of the DMSDLQBSARF classification model compared with 3 swarm intelligence algorithmbased RF methods, 8 twodimensional UCI datasets are applied. As the experimental results show, the proposed learning scheme can guarantee a more stable RF classification model with higher predictive accuracy compared to other counterparts. The rest of the paper is organized as follows:(i)In order to achieve a better balance between efficiency and velocity for BSA, we have studied the effects of four different hybrid strategies of the dynamic multiswarm method, differential evolution, and quantum behavior on the performance of BSA.(ii)The proposed DMSDLQBSA has successfully optimized and setting problem of RF. The resulting hybrid classification model has been rigorously evaluated on oil logging prediction.(iii)The proposed hybrid classification model delivers better classification performance and offers more accurate and faster results when compared to other swarm intelligence algorithmbased RF models.
2. Bird Swarm Algorithm and Its Improvement
2.1. Bird Swarm Algorithm Principle
BSA, as proposed by Meng et al. in 2015, is a new intelligent bionic algorithm based on multigroup and multisearch methods; it mimics the birds’ foraging behavior, vigilance behavior, and flight behavior, and employs this swarm intelligence to solve the optimization problem. The bird swarm algorithm can be simplified by the five rules: Rule 1: each bird can switch between vigilant behavior and foraging behavior, and both bird forages and keeps vigilance is mimicked as random decisions. Rule 2: when foraging, each bird records and updates its previous best experience and the swarms’ previous best experience with food patches. The experience can also be used to search for food. Instant sharing of social information is across the group. Rule 3: when keeping vigilance, each bird tries to move towards the center of the swarm. This behavior may be influenced by disturbances caused by swarm competition. Birds with more stocks are more likely to be near swarm’s centers than birds with lease stocks. Rule 4: birds fly to another place regularly. When flying to another location, birds often switch between production and shrubs. The bird with the most stocks is the producer, and the bird with the least is a scrounger. Other birds with the highest and lowest reserves are randomly selected for producers and scroungers. Rule 5: producers actively seek food. Scroungers randomly follow producers looking for food.
According to Rule 1, we define that the time interval of each bird flight behavior , the probability of foraging behavior , and a uniform random number .(1)Foraging behavior If the number of iteration is less than and , the bird will be the foraging behavior. Rule 2 can be written mathematically as follows: where and are two positive numbers; the former is called cognitive accelerated coefficients, and the latter is called social accelerated coefficients. Here, is the th bird’s best previous position and is the best previous swarm’s position.(2)Vigilance behavior If the number of iteration is less than and , the bird will be the vigilance behavior. Rule 3 can be written mathematically as follows: where and are two positive constants in , is the best fitness value of th bird, and is the sum of the swarms’ best fitness value. Here, , which is used to avoid zerodivision error, is the smallest constant in the computer. denotes the th element of the whole swarm’s average position.(3)Flight behavior If the number of iteration equals , the bird will be the flight behavior which can be divided into the behaviors of the producers and scroungers by fitness. Rule 3 and Rule 4 can be written mathematically as follows:where () means that the scrounger would follow the producer to search for food.
2.2. The Bird Swarm Algorithm Based on Dynamic MultiSwarm Method, Differential Evolution, and Quantum Behavior
2.2.1. The Foraging Behavior Based on Dynamic MultiSwarm Method
Dynamic multiswarm method has been widely used in realworld applications, because it is efficient and easy to implement. In addition, it is very common in the improvement of swarm intelligent optimization, such as coevolutionary algorithm [26], the framework of evolutionary algorithms [27], multiobjective particle swarm optimization [28], hybrid dynamic robust [29], and PSO algorithm [30, 31]. However, the PSO algorithm is easy to fall into the local optimum and its generalization performance is not high. Consequently, motivated by these literature studies, we will establish a dynamic multiswarm bird swarm algorithm (DMSBSA), and it improves the local search capability of foraging behavior.
In DMSPSO, the whole population is divided into many small swarms, which are often regrouped by using various reorganization plans to exchange information. The velocity update strategy iswhere is the best historical position achieved within the local community of the th particle.
According to the characteristic of equation (1), we can see that the foraging behavior formula of BSA is similar to the particle velocity update formula of PSO. So, according to the characteristic of equation (7), we can get the improved foraging behavior formula as follows:where is called the guiding vector.
The dynamic multiswarm method is used to improve the local search capability, while the guiding vector can enhance the global search capability of foraging behavior. Obviously, we need to build a good guiding vector.
2.2.2. The Guiding Vector Based on Differential Evolution
Differential evolution (DE) is a powerful evolutionary algorithm with three differential evolution operators for solving the tough global optimization problems [32]. Besides, DE has got more and more attention of scholars to evolve and improve in evolutionary computation, such as hybrid multiple crossover operations [33] and proposed DE/neighbor/1 [34], due to its excellent global search capability. From these literature studies, we can see that DE has a good global search capability, so we will establish the guiding vector based on differential evolution operator to improve the global search capability of foraging behavior. The detailed implementation of is presented as follows:(1)Differential mutation According to the characteristic of equation (8), the “DE/best/1”, “DE/best/2,” and “DE/currenttobest/1” mutation strategies are suitable. In the experiments of the literature [31], they showed that the “DE/best/1” mutation strategy is the most suitable in DMSPSO, so we choose this mutation strategy in BSA. And the “DE/lbest/1” mutation strategy can be written as follows: DE/lbest/1： Note that some components of the mutant vector may violate predefined boundary constraints. In this case, the boundary processing is used. It can be expressed as follows:(2)Crossover After differential mutation, a binomial crossover operation exchanges some components of the mutant vector with the best previous position to generate the target vector . The process can be expressed as(3)Selection Because the purpose of BSA is to find the best fitness, a selection operation chooses a vector accompanied a better fitness to enter the next generation to generate the selection operator, namely, guiding vector . The process can be expressed as follows: Choose a vector with better fitness to enter the next generation
2.2.3. The Initialization of Search Space Based on Quantum Behavior
Quantum behavior is a nonlinear and excellent superposition system. With its simple and effective characteristics and good performance in global optimization, it has been applied to optimize many algorithms, such as particle swarm optimization [35] and pigeoninspired optimization algorithm [36]. Consequently, according to the literature studies and its excellent global optimization performance, we use the quantum system to initialize the search space of the bird.
Quantumbehaved particle position can be written mathematically as follows:
According to the characteristics of equations (13)–(15), we can get the improved search space initialization formula as follows:where is a positive number, which can be, respectively, called as a contraction expansion factor. Here, is the position of the particle at the previous moment and is the average value of the best previous positions of all the birds (Algorithm 1).
2.2.4. Procedures of the DMSDLQBSA

In Sections 2.2.1–2.2.3, in order to improve the local search capability and the global search capability on BSA, this paper has improved the BSA in three parts:(1)In order to improve the local search capability of foraging behavior on BSA, we put forward equation (8) based on the dynamic multiswarm method.(2)In order to get the guiding vector to improve the global search capability of foraging behavior on BSA, we put forward equations (9), (11), and (12) based on differential evolution.(3)In order to expand the initialization search space of the bird to improve the global search capability on BSA, we put forward equations (16) and (17) based on quantum behavior.
Finally, the steps of DMSDLQBSA can be shown in Algorithm 1.
2.3. Simulation Experiment and Analysis
This section presents the evaluation of DMSDLQBSA using a series of experiments on benchmark functions and CEC2014 test functions. All experiments in this paper are implemented using the following: MATLAB R2014b; Win 7 (64bit); Inter (R) Core (TM) i52450M; CPU @2.50 GHz; 4.00 GB RAM. To obtain fair results, all the experiments were conducted under the same conditions. The number of the population size is set as 30 in these algorithms. And each algorithm runs 30 times independently for each function.
2.3.1. Benchmark Functions and CEC 2014 Test Functions
When investigating the effective and universal performance of DMSDLQBSA compared with several hybrid algorithms and popular algorithms, 18 benchmark functions and CEC2014 test functions are applied. In order to test the effectiveness of the proposed DMSDLQBSA, 18 benchmark functions [37] are adopted, and all of which have an optimal value of 0. The benchmark functions and their searching ranges are shown in Table 1. In this test suite, are unimodal functions. These unimodal functions are usually used to test and investigate whether the proposed algorithm has a good convergence performance. Then, are multimodal functions. These multimodal functions are used to test the global search capability of the proposed algorithm. The smaller the fitness value of functions, the better the algorithm performs. Furthermore, in order to better verify the comprehensive performance of DMSDLQBSA in a more comprehensively manner, another 30 complex CEC2014 benchmarks are used. The CEC2014 benchmark functions are simply described in Table 2.
2.3.2. Parameter Settings
In order to verify the effectiveness and generalization of the proposed DMSDLQBSA, the improved DMSDLQBSA is compared with several hybrid algorithms. These algorithms are BSA [7], DE [32], DMSDLPSO [31], and DMSDLBSA. Another 5 popular intelligence algorithms, such as grey wolf optimizer (GWO) [38], whale optimization algorithm (WOA) [39], sine cosine algorithm (SCA) [40], grasshopper optimization algorithm (GOA) [41], and sparrow search algorithm (SSA) [42], are used to compare with DMSDLQBSA. These algorithms represented stateoftheart can be used to better verify the performance of DMSDLQBSA in a more comprehensively manner. For fair comparison, the number of populations of all algorithms is set to 30, respectively, and other parameters of all algorithms are set according to their original papers. The parameter settings of these involved algorithms are shown in Table 3 in detail.
2.3.3. Comparison on Benchmark Functions with Hybrid Algorithms
According to Section 2.2, three hybrid strategies (dynamic multiswarm method, DE, and quantum behavior) have been combined with the basic BSA method. When investigating the effectiveness of DMSDLQBSA compared with several hybrid algorithms, such as BSA, DE, DMSDLPSO, and DMSDLBSA, 18 benchmark functions are applied. Compared with DMSDLQBSA, quantum behavior dynamic is not used in the dynamic multiswarm differential learning bird swarm algorithm (DMSDLBSA). The number of function evaluations (FEs) is 10000. We selected two different dimension’s sizes (Dim). Dim = 10 is the typical dimensions for the benchmark functions. And Dim = 2 is for RF has two parameters that need to be optimized, which means that the optimization function is 2dimensional.
The fitness value curves of a run of several algorithms on about eight different functions are shown in Figures 1 and 2, where the horizontal axis represents the number of iterations and the vertical axis represents the fitness value. We can obviously see the convergence speeds of several different algorithms. The maximum value (Max), the minimum value (Min), the mean value (Mean), and the variance (Var) obtained by several benchmark algorithms are shown in Tables 4–7, where the best results are marked in bold. Table 4 and 5 show the performance of the several algorithms on unimodal functions when Dim = 10 and 2, and Table 6 and 7 show the performance of the several algorithms on multimodal functions when Dim = 10 and 2.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
The evolution curves of these algorithms on four multimodal functions , , , and when Dim = 2 are depicted in Figure 2. We can see that DMSDLQBSA can find the optimal solution in the same iteration. For and case, DMSDLQBSA continues to decline. However, the original BSA and DE get parallel straight lines because of their poor global convergence ability. For functions and , although DMSDLQBSA also trapped the local optimum, it find the minimum value compared to other algorithms. Obviously, the convergence speed of the DMSDLQBSA is significantly faster than other algorithms in the early stage, and the solution eventually found is the best. In general, owing to enhance the diversity of population, DMSDLQBSA has a relatively balanced global search capability when Dim = 2.
Furthermore, from the numerical testing results on nine multimodal functions in Table 7, we can see that DMSDLQBSA has the best performance on , , , , , , and . DMSDLQBSA gets the minimum value of 0 on , , , and . BSA has got the minimum value of 0 on , , and . DE also has not got the minimum value of 0 on any functions. DMSDLBSA has got the minimum value of 0 on and . In summary, the DMSDLQBSA has a superior global search capability on most multimodal functions when Dim = 2. Obviously, DMSDLQBSA can find the best two parameters for RF that need to be optimized, because of its best global search capability.
In this section, it can be seen from Figures 1 and 2 and Tables 4–7 that DMSDLQBSA can obtain the best function values for most cases. It indicates that the hybrid strategies of BSA, dynamic multiswarm method, DE, and quantum behavior operators, lead to the bird moves towards the best solutions. And DMSDLQBSA has well ability of searching for the best two parameters for RF with higher accuracy and efficiency.
2.3.4. Comparison on Benchmark Functions with Popular Algorithms
When comparing the timeliness and applicability of DMSDLQBSA compared with several popular algorithms, such as GWO, WOA, SCA, GOA, and SSA, 18 benchmark functions are applied. And GWO, WOA, GOA and SSA are swarm intelligence algorithms. In this experiment, the dimension’s size of these functions is10. The number of function evaluations (FEs) is100000. The maximum value (Max), the minimum value (Min), the mean value (Mean), and the variance (Var) obtained by several different algorithms are shown in Tables 8 and 9, where the best results are marked in bold.
From the test results in Table 8, we can see that DMSDLQBSA has the best performance on each unimodal function. GWO finds the value 0 on , , , , and . WOA obtains 0 on , , and . SSA works the best on and . With the experiment of multimodal function evaluations, Table 9 shows that DMSDLQBSA has the best performance on , , , , , , and . SSA has the best performance on . GWO gets the minimum on . WOA and SCA obtains the optimal value on and . Obviously, compared with these popular algorithms, DMSDLQBSA is a competitive algorithm for solving several functions and the swarm intelligence algorithms perform better than other algorithms. The results of Tables 8 and 9 show that DMSDLQBSA has the best performance on the most test benchmark functions.
2.3.5. Comparison on CEC2014 Test Functions with Hybrid Algorithms
When comparing the comprehensive performance of proposed DMSDLQBSA compared with several hybrid algorithms, such as BSA, DE, DMSDLPSO, and DMSDLBSA, 30 CEC2014 test functions are applied. In this experiment, the dimension’s size (Dim) is set to 10. The number of function evaluations (FEs) is 100000. Experimental comparisons included the maximum value (Max), the minimum value (Min), the mean value (Mean), and the variance (Var) are given in Tables 10 and 11, where the best results are marked in bold.
Based on the mean value (Mean), on the CEC2014 test functions, DMSDLQBSA has the best performance on F_{2}, F_{3}, F_{4}, F_{6}, F_{7}, F_{8}, F_{9}, F_{10}, F_{11}, F_{15}, F_{16}, F_{17}, F_{21}, F_{26}, F_{27}, F_{29}, and F_{30}. DMSDLBSA does show an advantage on F_{1}, F_{12}, F_{13}, F_{14}, F_{18}, F_{19}, F_{20}, F_{24}, F_{25}, and F_{28}. According to the results, we can observe that DMSDLQBSA can find the minimal value on 17 CEC2014 test functions. DMSDLBSA gets the minimum value on F_{1}, F_{12}, F_{13}, F_{14}, F_{18}, F_{19}, F_{24}, and F_{30}, and DMSDLPSO obtains the minimum value on F_{4}, F_{7}, and F_{23}. Owing to enhance the capability of exploitation, DMSDLQBSA is better than DMSDLBSA and DMSDLPSO on most functions. From the results of tests, it can be seen that DMSDLQBSA performs better than BSA, DE, DMSDLPSO, and DMSDLBSA. It can be observed that DMSDLQBSA obtains optimal value. It can be concluded that DMSDLQBSA has better global search ability and better robustness on these test suites.
3. Optimize RF Classification Model Based on Improved BSA Algorithm
3.1. RF Classification Model
RF, as proposed by Breiman et al., is an ensemble learning model based on bagging and random subspace methods. The whole modeling process includes building decision trees and decision processes. The process of constructing decision trees is mainly composed of decision trees, and each of which consists of nonleaf nodes and leaf nodes. The leaf node is a child node of the node branch. It is supposed that the dataset has M attributes. When each leaf node of the decision tree needs to be segmented, the attributes are randomly selected from the M attributes as the reselected splitting variables of this node. This process can be defined as follows:where is the splitting variable of the th leaf node of the decision tree, and is the probability that reselected attributes are selected as the splitting attribute of the node.
The nonleaf node is a parent node that classifies training data as a left or right child node. The function of th decision tree is as follows:where , where the symbol 0 indicates that the th row of data is classified as a negative label and the symbol 1 indicates that the th row of data is classified as a positive label. Here, is the training function of the th decision tree based on the splitting variable . is the th row of data in the dataset by random sampling with replacement. The symbol is a positive constant, which is used as the threshold value of the training decision.
When decision processes are trained, each row of data will be input into a leaf node of each decision tree. The average of decision tree classification results is used as the final classification result. This process can be written mathematically as follows:where is the number of decision trees which judged th row of data as .
From the above principle, we can see that it is mainly necessary to determine two parameters of and in the RF modeling process. In order to verify the influence of these two parameters on the classification accuracy of the RF classification model, the Ionosphere dataset is used to test the influence of the two parameters on the performance of the RF model, as shown in Figure 3, where the horizontal axis represents and , respectively, and the vertical axis represents the accuracy of the RF classification model.(1)Parameter analysis of When the number of predictor variables is set to 6, the number of decision trees is cyclically set from 0 to 1000 at intervals of 20. And the evolutionary progress of RF classification model accuracy with the change of is shown in Figure 3(a). From the curve in Figure 3(a), we can see that the accuracy of RF is gradually improved with the increase of the number N of decision trees. However, when the number of decision trees is greater than a certain value, the improvement of RF performance has become gentle without obvious improvement, but the running time becomes longer.(2)Parameter analysis of When the number of decision trees is set to 500, the number of predictor variables is cyclically set from 1 to 32. The limit of is set to 32, because the number of attributes of the Ionosphere dataset is 32. And the obtained curve of RF classification model accuracy with transformation is shown in Figure 3(b). And we can see that with the increase of the splitting property of the selection, the classification performance of RF is gradually improved, but when the number of predictor variables is greater than 9, the RF generates overfitting and the accuracy of RF begins to decrease. The main reason is that too many split attributes are selected, which resulted in the same splitting attributes which are owned by a large number of decision trees. This reduced the diversity of decision trees.
(a)
(b)
In summary, for the RF classification model to obtain the ideal optimal solution, the selection of the number of decision trees and the number of predictor variables are very important. And the classification accuracy of the RF classification model can only be optimized by the comprehensive optimization of these two parameters. So, it is necessary to use the proposed algorithm to find a suitable set of RF parameters. Next, we will optimize the RF classification model by the improved BSA proposed in Section 2.
3.2. RF Model Based on an Improved Bird Swarm Algorithm
Improved bird swarm algorithm optimized RF classification model (DMSDLQBSARF) is based on the improved bird swarm algorithm optimized the RF classification model and introduced the training dataset into the training process of the RF classification model, finally getting the DMSDLQBSARF classification model. The main idea is to construct a twodimensional fitness function containing RF’s two parameters and as the optimization target of DMSDLQBSA, so as to obtain a set of grouping parameters and make the RF classification model obtain the best classification accuracy. The specific algorithm steps are shown as in Algorithm 2.

3.3. Simulation Experiment and Analysis
In order to test the performance of the improved DMSDLQBSARF classification model, we compare the improved classification model with the standard RF model, BSARF model, and DMSDLBSARF model on 8 twodimensional UCI datasets. The DMSDLBSARF classification model is an RF classification model optimized by BSA without quantum behavior. In our experiment, each of datasets is divided into two parts: 70% of the dataset is as training set and the remaining 30% is as a test set. The average classification accuracies of 10 independent runs of each model are recorded in Table 12, where the best results are marked in bold.
From the accuracy results in Table 12, we can see that the DMSDLQBSARF classification model can get best accuracy on each UCI dataset except magic dataset. And the DMSDLBSARF classification model has got best accuracy on magic dataset. Then, compared with the standard RF model, the accuracy of the DMSDLQBSARF classification model can get better accuracy which is increased by about 10%. Finally, the DMSDLQBSARF classification model has got the best accuracy on appendicitis dataset which is up to 93.55%. In summary, the DMSDLQBSARF classification model has validity on most datasets and a good performance on them.
4. Oil Layer Classification Application
4.1. Design of Oil Layer Classification System
The block diagram of the oil layer classification system based on the improved DMSDLQBSARF is shown in Figure 4. The oil layer classification can be simplified by the following five steps: Step 1. The selection of the actual logging datasets is intact and fullscale. At the same time, the datasets should be closely related to rock sample analysis. The dataset should be relatively independent. The dataset is randomly divided into two parts of training and testing samples. Step 2. In order to better understand the relationship between independent variables and dependent variables and reduce the sample information attribute, the dataset continuous attribute should be discretized by using a greedy algorithm. Step 3. In order to improve the calculation speed and classification accuracy, we use the covering rough set method [43] to realize the attribute reduction. After attribute reduction, normalization of the actual logging datasets is carried out to avoid computational saturation. Step 4. In the DMSDLQBSARF layer classification model, we input the actual logging dataset after attribute reduction, use a DMSDLQBSARF layer classification algorithm to train, and finally get the DMSDLQBSARF layer classification model. Step 5. The whole oil section is identified by the trained DMSDLQBSARF layer classification model, and we output the classification results.
In order to verify the application effect of the DMSDLQBSARF layer classification model, we select three actual logging datasets of oil and gas wells to train and test.
4.2. Practical Application
In Section 2.3, the performance of the proposed DMSDLQBSA is simulated and analyzed on benchmark functions. And in Section 3.3, the effectiveness of the improved RF classification model optimized by the proposed DMSDLQBSA is tested and verified on twodimensional UCI datasets. In order to test the application effect of the improved DMSDLQBSARF layer classification model, three actual logging datasets are adopted and recorded as mathematical problems in engineering W1, W2, and W3. The W1 is a gas well in Xian (China), the W2 is a gas well in Shanxi (China), and the W3 is an oil well in Xinjiang (China). The depth and the corresponding rock sample analysis samples of the three wells selected in the experiment are as shown in Table 13.
Attribute reduction on the actual logging datasets is performed before the training of the DMSDLQBSARF classification model on the training dataset, as shown in Table 14. Then, these attributes are normalized as shown in Figure 5, where the horizontal axis represents the depth and the vertical axis represents the normalized value.
(a)
(b)
(c)
(d)
(e)
(f)
The logging dataset after attribute reduction and normalization is used to train the oil and gas layer classification model. In order to measure the performance of the DMSDLQBSARF classification model, we compare the improved classification model with several popular oil and gas layer classification models. These classification models are the standard RF model, SVM model, BSARF model, and DMSDLBSARF model. Here, the RF classification model was first applied to the field of logging. In order to evaluate the performance of the recognition model, we select the following performance indicators:where and are the classification output value and the expected output value, respectively.
RMSE is used to evaluate the accuracy of each classification model. MAE is used to show actual forecasting errors. Table 15 records the performance indicator data of each classification model, and the best results are marked in bold. The smaller the RMSE and MAE, the better the classification model performs.
From the performance indicator data of each classification model in Table 15, we can see that the DMSDLQBSARF classification model can get the best recognition accuracy and all the accuracies are up to 90%. The recognition accuracy of the proposed classification model for W3 is up to 99.73%, and it has superior performance for oil and gas layer classification in other performance indicators and different wells. Secondly, DMSDLQBSA can improve the performance of RF, and the parameters found by DMSDLQBSA used in the RF classification model can improve the classification accuracy and keep running speed relatively fast at the same time. For example, the running times of DMSDLQBSARF classification model for W1 and W2 are, respectively, 0.0504 seconds and 1.9292 seconds faster than the original RF classification model. Based on above results of data, the proposed classification model is better than the traditional RF and SVM model in oil layer classification. The comparison of oil layer classification result is shown in Figure 6, where, (a), (c), and (e) represent the actual oil layer distribution and (b), (d), and (f) represent DMSDLQBSARF oil layer distribution. In addition, 0 means this depth has no oil or gas and 1 means this depth has oil or gas.
(a)
(b)
(c)
(d)
(e)
(f)
From Figure 6, we can see that the DMSDLQBSARF classification model identifies that the oil layer distribution results are not much different from the test oil test results. It can accurately identify the distribution of oil and gas in a well. The DMSDLQBSARF model is suitable for petroleum logging applications, which greatly reduces the difficulty of oil exploration and has a good application foreground.
5. Conclusion
This paper presents an improved BSA called DMSDLQBSA, which employed the dynamic multiswarm method, differential evolution, and quantum behavior to enhance the global and the local exploration capabilities of original BSA. First, 18 classical benchmark functions are used to verify the effectiveness of the improved method. The experimental study of the effects of these three strategies on the performance of DMSDLQBSA revealed that the hybrid method has an excellent influence to improve the improvement of original GOA and especially original DE. Second, compared with the popular intelligence algorithms, such as GWO, WOA, SCA, GOA, and SSA, the DMSDLQBSA can provide more competitive results on the 18 classical benchmark functions. Additionally, 30 complex CEC2014 test functions are used to better verify the performance of DMSDLQBSA in a more comprehensively manner. The DMSDLQBSA can show more excellent performance on the 18 classical benchmark functions. Finally, the improved DMSDLQBSA is used to optimize the parameters of RF. Experimental results on actual oil logging prediction problem have proved that the classification accuracy of the established DMSDLQBSARF classification model can get 94.00%, 94.24%, and 99.73% on these wells, and the accuracy is much higher than the original RF model. At the same time, the running speed performed faster than other four advanced classification models on most wells.
Although the proposed DMSDLQBSA has been proven to be effective in solving general optimization problems, DMSDLQBSA has some shortcomings that warrant further investigation. And in DMSDLQBSA, due to the hybrid of three strategies, DMSDLQBSA has needed more time than the classical BSA. Therefore, deploying the proposed algorithm to increase recognition efficiency is a worthwhile direction. In the future research work, the method presented in this paper can also be extended to solving discrete optimization problems and multiobjective optimization problems. Furthermore, applying the proposed DMSDLQBSARF model to other fields such as financial prediction and biomedical science diagnosis is also an interesting future work.
Data Availability
All data included in this study are available upon request by contact with the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (no. U1813222), Tianjin Natural Science Foundation, China (no. 18JCYBJC16500), Key Research and Development Project from Hebei Province, China (nos. 19210404D and 20351802D), Key Research Project of Science and Technology from the Ministry of Education of Hebei Province, China (no. ZD2019010), and Project of IndustryUniversity Cooperative Education of Ministry of Education of China (no. 201801335014).