Abstract

Semisupervised support vector machine (S3VM) algorithm mainly depends on the predicted accuracy of unlabeled samples, if lots of misclassified unlabeled samples are added to the training will make the training model performance degrade. Thus, the cuckoo search algorithm (CS) is used to optimize the S3VM which also enhances the model performance of S3VM. Considering that the cuckoo search algorithm is limited to the local optimum problem, a new cuckoo search algorithm based on chaotic catfish effect optimization is proposed. First, use the chaotic mechanism with high randomness to initialize the nest for range expansion. Second, chaotic catfish nest is introduced into the effective competition coordination mechanism after falling into the local optimum, so that the candidate’s nest can jump out of the local optimal solution and accelerate the convergence ability. In the experiment, results show that the improved cuckoo search algorithm is effective and better than the particle swarm optimization (PSO) algorithm and the cuckoo search algorithm on the benchmark functions. In the end, the improved cuckoo search algorithm is used to optimize semisupervised SVM which is applied into oil layer recognition. Results show that this optimization model is superior to the semisupervised SVM in terms of recognition rate and time.

1. Introduction

Semisupervised learning [1] studies how to improve learning performance by using labeled and unlabeled samples simultaneously. It has become a research focus and hotspot in pattern recognition and machine learning. In such a mixed data learning process, the sample distribution information of the unlabeled dataset is transferred to the final learning model, so that the trained learning model has better performance. Literature [2] proposes a novel graph for semisupervised discriminant analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph, and map the data to the LR feature space and then the KNN is adopted to satisfy the algorithmic requirements of Semisupervised Discriminant Analysis (SDA). Semisupervised support vector machine is first proposed by Professor V. Vapnik [3] when the labeled samples are not enough, and it is difficult to achieve satisfactory performance; support vector machine (SVM) can use the transductive learning to improve the performance. It can be regarded as the generalization of the SVM in the unlabeled samples. S3VM has received extensive attention in recent years. Qing Wu et al. [4] described a smooth piecewise function and research smooth piecewise semisupervised SVM and used a converging linear PSO to train semisupervised SVM to get better classification accuracy. Luo et al. [5] introduced semisupervised learning for the least squares support vector machine (LSSVM) algorithm to improve the accuracy of model predictions and is used to predict the distribution of porosity and sandstone in the Jingzhou study area. Literature [6] puts forward a method with the name of Ensemble S3VM which deals with the unknown distribution by ensemble learning and applies the algorithm to ground cover classification for polarimetric synthetic aperture radar images.

Cuckoo search [7] algorithm is a new nature-inspired metaheuristic based on the obligate brood parasitic behavior of cuckoo species [8] and the Levy flight search mechanism to effectively solve the optimization problem [9]. In order to further optimize the CS algorithm, many experts and scholars have studied it. When Srivastava et al. [10] investigate local search in Levy flights, they enter tabu search idea to avoid falling into local optimum, which is successfully applied to solve the problem of automatic generation of software test data. Yang and Deb [11] proposed a multitarget cuckoo search, which was applied to engineering optimization and achieved good results. Zhang et al. [12] prescribed a modified adaptive cuckoo search (MACS) to improve the performance of CS. Wang et al. [13] used an evaluation strategy based on a dimension-by-dimension update for the progress of the iteration of the improved algorithm and proposed an enhanced CS algorithm. In literature [14], a new local-enhanced cuckoo search algorithm is designed aiming to deal with some multimodal numerical problems. Literature [15] considers utilizing multiple chaotic maps simultaneously to perform the local search within the neighborhood of the global best solution found by the chaotic cuckoo search algorithm.

The above-improved versions of CS algorithm jumped out of the local optimal or improve the convergence speed of the algorithm. In view of the fact that CS algorithm is coming to the end of the iteration, population groups tend to converge too early and lead to local optimum. First, we use chaos mapping instead of general randomization to initialize nest position, it makes the initial nest location not only has the distribution randomness but also strengthens the diversity of the bird’s nest distribution. Second, according to the literature [16], they used catfish effect to optimize artificial bee colony (ABC) algorithm in order to obtain a good ability to jump out of local optimum, we applied the catfish effect to CS algorithm, added it to the nest, and then get the chaotic catfish nest. Consequently, it improves the convergence rate of the whole population and the shortage of the algorithm into a local optimum. Finally, the CS algorithm based on the above-improved strategy is used to optimize the S3VM algorithm and apply it to the oil layer recognition and establish a new semisupervised oil layer recognition model, expecting a better recognition of oil layers.

2. Cuckoo Search Algorithm and Its Improvement

2.1. Cuckoo Search Algorithm Principle
2.1.1. Cuckoo Breeding Behavior

According to the long-term observations of an entomologist, cuckoo has adopted a special breeding strategy parasitic brood [8]. It lays eggs in the nests of other birds and allows other birds to hatch. In order to reduce the possibility of being discovered by host birds, the cuckoo will choose the host birds that are basically alike in eating habits and easy to imitate ovately and color. When it flies to a nest, it only produces one and removes the host’s egg before spawning, or all out of the nest, forcing the host to lay eggs again. Once the cuckoo’s hatchlings hatch, it has the habit of bringing the host chicks out of the nest, thus enjoying the host bird’s tending. But when the host birds find their nests have foreign eggs, they also throw the parasitic eggs or abandon their nests and build a nest in other places.

2.1.2. Levy Flights

Various studies have shown that the flight behavior of many animals and insects has demonstrated the typical characteristics of Levy flights [1719]. A study by Reynolds and Frye shows that fruit flies or Drosophila melanogaster explore their landscape using a series of straight flight paths punctuated by a sudden turn, leading to a Levy-flight-style intermittent scale-free search pattern. Studies on human behavior such as the Ju/’hoansi hunter-gatherer foraging patterns also show the typical feature of Levy flights. Even light can be related to Levy flights [20]. And when the target location is random and sparsely distributed, Levy flight is the best search strategy for M independent search seekers.

Levy flight belongs to one type of random walk, and the walking step satisfies a stable distribution of heavy-tailed. In this form of walking, short distance exploration and occasional long distance walk alternate. Levy flight in intelligent optimization algorithm can expand search scope, increase population diversity and make it easier to jump out of local optimum.

Subsequently, such behavior has been applied to optimization and optimal search, and preliminary results show its promising capability [18].

2.1.3. Cuckoo Search

The cuckoo search algorithm is based on the parasitic propagation mechanism of cuckoo bird and Levy flights search principle. It is mainly based on the following three ideal rules:(1)Each cuckoo lays one egg at one time and selects a nest randomly to hatch it.(2)The best nests will be preserved to the next generation in a randomly selected group of nests.(3)The number of nests available is fixed, and the probability that the host bird of a nest will find the exotic bird’s egg is .

On the basis of the above three ideal rules, the routing and location updating formula of cuckoo's nest is as follows:In the case, is the size of step, =1 in general. is point multiplication, and is the search path.

The Levy flight essentially provides a random walk while the random step length is drawn from a levy distribution as shown in the following formula:which has an infinite variance with an infinite mean. Here the steps essentially form a random walk process with a power-law step-length distribution with a heavy tail. Some of the new solutions should be generated by Levy walk around the best solution obtained so far; this will speed up the local search. However, a substantial fraction of the new solutions should be generated by far-field randomization and whose location should be far enough from the current best solution, this will make sure the system will not be trapped in a local optimum.

To sum up, the main steps of the CS algorithm can be described as follows.

Step 1. The objective function is , initialization group, and generates the initial position of N bird's nest randomly.

Step 2. Calculate the objective function value of each nest and record the current best solution.

Step 3. Keep the location of the best nest in the previous generation, and update the position of other bird's nest according to formula (1).

Step 4. Comparing the existing bird’s nest with the previous generation, if it is better, it takes it as the best position at present.

Step 5. Using a random number R compare with , if , then change the nest position randomly, and get a new set of nest position.

Step 6. If the end condition is not met, return Step 2.

Step 7. Output global optimal position.

2.2. The Cuckoo Search Algorithm Based on Chaotic Catfish Effect
2.2.1. Initial Chaotic Nest

The initial nest location of the bird’s nest is an important link in the CS algorithm. It not only affects the convergence speed of the algorithm but also restricts the quality of the final solution of the problem. According to the randomness, regularity, and mutual correlation of chaotic systems, we use chaos mapping to extract and capture information in the solution space to enhance the diversity of the initial bird nest location. In this paper, the logistic chaotic mapping is adopted, and its mathematical iterative equation is as follows:Among them, T is a presupposed maximum number of chaotic iterations and is a random number distributed uniformly on the interval , because the initial value cannot take any of the fixed points; otherwise it is a stable orbit and cannot generate chaos, so . is chaos control parameters; the system will be in a state of complete chaos when =4.

Firstly, a set of chaotic variables is produced by using formula (3); secondly, we map the location of n, d dimension nest to the chaotic interval by the chaotic sequence once according to formula (4). and represent the largest and the smallest boundary of , respectively. The average accuracy is the fitness function, which is based on the k-fold cross-validation approach. The distribution of nest is shown in Figure 1.

Finally, the fitness of each nest was calculated.

2.2.2. Updating the Catfish New Nest

The catfish effect is derived from a few catfish placed in the sardine sink by Norway merchants to drive and stir the sardine group, making the sardines more likely to survive. In other words, the catfish nests can guide nests trapped in a local optimum onto a new region of the search space and thus to potentially better nest solutions. In literature [21], the catfish effect is applied to the particle swarm optimization, which improves the algorithm’s ability to solve. In literature [16], the catfish effect is fused into the artificial bee colony algorithm to enhance the probability of obtaining the global optimal solution. Literature [22], combined with catfish effect and cloud model in particle swarm optimization, increased particle diversity and improved accuracy and finally is applied to the circuit breaker, effectively improving the design efficiency. Literature [23] uses the catfish effect to improve the bat algorithm to attract the population to jump out of the current local optimal solution, so as to maintain population diversity and get better global search ability and high convergence speed.

In the CS algorithm, if the “limit” times do not update the nest position, then we can see that the algorithm has been trapped in the local optimal solution, which leads to the algorithm failure which cannot get the globally optimal solution, and only at this local solution stagnation. According to literature [16]’s idea of chaotic catfish bee, this paper proposes a chaotic catfish nest, in order to jump out of the local optimal solution which is trapped in the local optimal solution and finally converge to the global optimal solution. The concrete steps are as follows.

Step 1 (sort and mark). The corresponding fitness values of the current n bird’s nest were arranged in ascending order, marking the last 10% bird nest as the worst bird nest position.

Step 2 (the nest of a chaotic catfish is produced). Set the initial position vector of the nest of the chaotic catfish , in accordance with the following formula:Among them, and are the maximum and minimum value of the j dimension of the m catfish wasp, respectively. r is a random number in the interval, and 0.5 is the boundary point to make the distribution of approximate at the two polar values.

Step 3 (update the nest of the chaotic catfish). The position of the chaotic catfish nest is updated with formula (6), in which T is a chaotic iterative threshold.

Step 4 (the nest of a chaotic catfish is introduced). The nest which is marked as the worst bird nest position is abandoned, and the nest of the same amount of chaotic catfish bird’s nest is placed in the original nest of the worst bird nest.

Step 5 (update the location of the bird’s nest). The fitness value of the chaotic catfish nests was calculated by using formula (4), and the new nest location and corresponding fitness value were updated to enter the next iteration.

Finally, the steps of cuckoo search algorithm based on chaotic catfish effect (CCCS) can be shown in Algorithm 1.

(01) Input: Dimension of the nest: , Total number of the nest: , Maximum number of
  iterations: time, The probability of being discovered by the host: , The maximum number
  of updates of the bird’s nest: limit, Maximum chaotic iterations:
(02) Output: The best nest
(03) Begin
(04) Initial chaotic nest:
(05) Calculated fitness: ;
(06) While (present iterations time)
(07)  Generate a new solution by ;
(08)  If (climit)
(09)    from small to large in accordance with ;
   Delete the solution of the end 10%
(11)     Generate m catfish new nest Placed in the tail of ;
(12)    Output the catfish new solution ;
(13)  End
(14)  Select candidate solution ;
(15)  If  
(16)    The new solution is used instead of the candidate solution;
(17)    Count(present iterations)=1;
(18)   Else
(19)   Count(present iterations)=0;
(20)  End
(21)   Discarding the worst solution according to the probability ;
(22)   A new solution is used to replace the discarded solution with a preference random walk;
(23)  Keep the best solution;
(24) End
(25) End

Among them, Count is a column vector of . If the optimal solution is updated at iteration, then , otherwise . c is the number of successive 0 in the vector , if , the algorithm can be found to be in the local optimal, and the new nest of catfish is added to replace some inferior nests in order to help the algorithm jump out of the local optimal.

2.3. Simulation Test and Analysis
2.3.1. Benchmark Functions

In order to verify the effectiveness and generalization of the improved algorithm, we compare the improved algorithm with CS algorithm [7], particle swarm optimization Algorithm(PSO) [24], and chaotic gravitation search algorithm (CGSA) that the first method to incorporate chaos into gravitation search algorithm (GSA) in literature [25] on 8 benchmark functions. The 8 benchmark functions and their parameter settings are shown in Table 1.

2.3.2. Parameter Settings

In our experiment, the dimension’s (Dim) size for each function was 2, and the corresponding maximum number of iteration was 1000. We carried out 30 independent runs for CS, CCCS, PSO, and CGSA.

The parameter settings of CS, CCCS, PSO, and CGSA were assigned as follows: size of population/nest , maximum iteration , =0.25, the largest renewal number of bird’s nest limit=5, and the largest iteration number of chaos T=300.

2.3.3. Experimental Results and Discussion

The evolution curves of the best fitness value of the standard CS, CCCS, PSO, and CGSA when the maximum iteration number is 1000 is shown in Figure 2, for each kind of algorithm, the program runs 30 times independently.

The experiment selected Schaffer, Schwefel, Sphere, and Sum of different power four unimodal functions and Ackley, Rastrigrin, Branin, and Griewank four multimodal functions. Firstly, we can see that when testing on unimodal functions from Figure 2, the CCCS algorithm can find the optimal solution for most benchmark functions in the number of iterations that are far less than 1000 times. For example, when using the Sphere function, the optimal solution is obtained only in the 750th iteration. Secondly, the CCCS algorithm can get the best solution for , , , , , , . For example, the optimal solution is obtained in the 957st iteration when using the Griewank function, however, CS, PSO, and CGSA cannot find the optimal solution within 1000 iterations. Thirdly, when multimodal functions are selected, because of the complexity of multimodal functions and the existence of multiple local minima, the CCCS algorithm still has a strong ability to find the global optimal solution, and the convergence speed of the CCCS algorithm is better than the other three algorithms. For instance, the CCCS algorithm can find the optimal solution for all multimodal functions, such that the optional solution is obtained in 824th iteration when using the Ackley function. Finally, comparing CCCS algorithm with CS algorithm, the CCCS algorithm always converges quickly and finds the global optimal solution in the same iteration.

Evaluating algorithm performance by statistics of the maximum value, minimum value, average value, and variance, the numerical test results are shown in Table 2.

It can be seen from the numerical result of Table 2 that the CCCS algorithm has a superior search performance to the other three algorithms under all the performance index and different functions.

3. Improved CS Algorithm Optimized Semisupervised SVM Model

3.1. Semisupervised SVM Basic Model

Semisupervised SVM can improve the prediction accuracy of an algorithm based on SVM with both labeled and unlabeled samples. It can be considered as a perfect realization based on low-density hypothesis and clustering hypothesis [26]. Recently, it has become a hotspot in the field of machine learning. The main principles are as follows.

Given a labeled dataset Land unlabeled dataset Ufind an effective method to predict the label of unlabeled samples and get its semilabeled samples :Getting a hybrid set containing labeled and semilabeled samplescan be separated at the maximum interval, and the larger classification interval means that the classifier has better generalization ability.

When the optimal hyperplaneseparates the mixed data set, the classification results can maximize the interval, where is a vector normal to the hyperplane, and is a constant such that is the distance of the hyperplane from the origin. The above problem can be formalized as the following optimization problem: and are the regularization parameters associated with the labeled and semilabeled samples, respectively; and are the corresponding slack variables. The kernel function is radial basis kernel function (RBF), , and is kernel parameters.

The S3VM is retrained with the new training set, and this procedure is continued iteratively until a stopping criterion is reached (e.g., maximum number of iterations). At each iteration, new semilabeled samples are added to the training set constructed in the previous iteration, while the semilabeled samples that change their label are removed from the new semilabeled set and moved back to the unlabeled set.

3.2. Semisupervised SVM Model Based on an Improved Cuckoo Search Algorithm

Improved cuckoo search algorithm optimized semisupervised SVM model (CCCS-S3VM) is based on improved cuckoo search algorithm optimized SVM (CCCS-SVM), by adding the semisupervised learning, which introduced unlabeled samples into the training process and then being trained with labeled samples together, finally getting the CCCS-S3VM model. The main idea is to use SVM’s regularization parameter and RBF’s kernel parameters as the optimization target of CCCS algorithm, in order to get a set of combination parameters that make SVM get the best classification accuracy. The specific algorithm steps are shown as in Algorithm 2.

(01) Input:
(02) Labeled dataset L
(03) Unlabeled dataset U
(04) Regularization parameter of unlabeled dataset
(05) Output:
(06) The label of unlabeled samples ;
(07) Final model CCCS-S3VM;
(08) Begin
(09) The regularization parameter and the kernel parameter of the labeled samples
   that obtain by CCCS-SVM algorithm;
 Training the initial model with labeled dataset L;
(11)   Predict the label of unlabeled dataset U using :
(12)  Initialization: ;
(13)  While   do
(14)   Solve formula (12) based on L, U, , , , obtain and ;
(15)   There is a pair of unlabeled samples and , Its label and is reverse,
    and the corresponding relaxation variables are satisfied: , it means that and
     likely to be a wrong label. The labels of the two are exchanged and the SVM
    problem is solved again. The approximate solution of the minimization of the
    objective function can be obtained after each round of iteration.
(16)  While do
(17)   ;    label exchange;
(18)    Solve formula (12) based on L, U, , obtain and ;
(19)  End while
(20)  
(21)  End while
(22) End
3.3. Oil Layer Recognition Application
3.3.1. Design of Oil Layer Recognition System

Block diagram of the oil layer recognition system based on CCCS-S3VM is shown in Figure 3.

The oil layer recognition mainly has the following steps.

Step 1 (data selection and preprocessing). The selection of the data set is complete and comprehensive and should be closely related to layer evaluation. The dataset should be divided into two parts of training and testing samples, which is conducted normalization processing to avoid calculation saturation phenomenon.

Step 2 (attribute discretization and generalization). In order to achieve the reduction of the sample information attribute, first extract sample information, conduct decision attribute generalization, and use curve inflection point approach to achieve continuous attribute discretization.

Step 3 (attribute reduction of data information). Oil logging data includes sound, electricity, nuclear, and other logging information. There are more than 10 kinds of logging property in Logging series, but some properties are not important, attribute reduction must be carried out to eliminate concentrated redundant attributes in data. We use the attribute reduction based on consistent covering rough set [27].

Step 4 (CCCS-S3VM modeling). In CCCS-S3VM model, input the sample information after attribute reduction, use an S3VM algorithm to train, and label the unlabeled samples. Use the CCCS algorithm to speed up the solution speed and accuracy and finally get the CCCS-S3VM classification model.

Step 5 (recognition output). The layer of the entire well section is recognized by the trained CCCS-S3VM model and output the results.

In order to verify the application effect of the S3VM layer recognition model based on CCCS, we select three logging data for training and testing.

3.3.2. Practical Application

Hardware environment: MacOS10.13.3/Matlab 2016a/2.8GHz/Intel Core i7/16GB.

In Section 2.3, the effectiveness of the improved CCCS algorithm is simulated and analyzed on benchmark functions. In order to verify the application effect of the semisupervised model optimized by the improved algorithm, we select three logging data for training and testing, record as , respectively. In addition, its reductions results and distribution of those logging data are as in Tables 3 and 4, respectively.

The CCCS-S3VM model trained on the training data set is used to identify the oil layer of the test sample. Table 5 shows the normalized range of four sample attributes in the whole well after the reduction. These attributes are normalized as shown in Figure 4, where the horizontal axis represents the depth, and the vertical axis represents a normalized value.

In CCCS-S3VM model, the sample information is input by attribute reduction, we use CCCS algorithm to find the optimal and and obtain trained S3VM forecasting model. There are three test datasets from three wells in different good section, which are used in oil layer recognition by trained prediction model; we establish SVM model, S3VM model, and QPSO-S3VM model, respectively; their recognition results are compared with CCCS-S3VM model. In addition, supervised SVM model trained with the full available training set, but the training set of semisupervised models consists of the labeled set and the unlabeled set. The labeled set consists of 10% of the training set, and the unlabeled set consists of the remaining data of the training set.

In order to measure the performance of the recognition model, we define the following performance index:Here, and are the recognition output value and the desired output value, respectively.

RMSE measured the deviations between the recognition output value and the desired output value, which can well reflect the precision of the measurement. MAE is the mean of the absolute value of the deviation of all individual and arithmetic mean values. The smaller the RMSE and MAE, the better the performance of the algorithm model. Therefore, RMSE is used as the standard to evaluate the accuracy of each algorithm model. MAE is used as the actual situation of the prediction and prediction error. The performance index data of each algorithm model are shown in Table 6.

Table 6 clearly shows that the recognition efficiency of semisupervised algorithms (S3VM, PSO-S3VM, CCCS-S3VM) is significantly close to the supervised algorithm(SVM), and the proposed model (CCCS-S3VM) in this paper has higher accuracy than SVM algorithm in wells and . Secondly, both the PSO and the improved CS algorithm can improve the performance of semisupervised SVM, but the improvement of its accuracy means longer running time and large computational complexity. However, the CS algorithm based on chaotic catfish effect can make the semisupervised SVM algorithm improve the classification accuracy and keep running speed fast at the same time. In summary, CCCS-S3VM constituted catfish effect and chaotic theory is better than the traditional SVM and S3VM model in oil layer recognition. The classification is shown in Figure 5, among them, (a), (c), and (e) represent the actual oil layer distribution; (b), (d), and (f) represent CCCS-S3VM oil layer distribution.

According to a comparison of recognition results in Figure 5, firstly, we know both recognition accuracies in three wells go up to 90%. It can effectively recognize the oil layer distribution accurately. Secondly, the CCCS-S3VM model can predict the distribution of a large number of reservoir samples only using a small number of labeled samples; it greatly reduces the need of labeled samples and has a good application prospect.

4. Conclusion

In view of the possibility that the cuckoo algorithm is locally optimal, the chaotic catfish effect is introduced into the cuckoo algorithm, which enhances the ability of the algorithm to search for the global optimal solution and validates its effectiveness on the benchmark functions. Then, the improved algorithm is used to optimize the semisupervised SVM algorithm. Finally, it is applied to the oil layer recognition to solve the problem that the labeled data are difficult to get when logging. This method has good classification effect. It improves the recognition rate and reduces data extraction cost when using a small number of labeled data, and runs faster, which is obviously better than other classical algorithms. It has good prospects.

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by Tianjin Natural Science Foundation (no. 18JCYBJC16500) and Hebei Province Natural Science Foundation (no. E2016202341).