Abstract

Slime mould algorithm (SMA) is a new metaheuristic algorithm, which simulates the behavior and morphology changes of slime mould during foraging. The slime mould algorithm has good performance; however, the basic version of SMA still has some problems. When faced with some complex problems, it may fall into local optimum and cannot find the optimal solution. Aiming at this problem, an improved SMA is proposed to alleviate the disadvantages of SMA. Based on the original SMA, Gaussian mutation and Levy flight are introduced to improve the global search performance of the SMA. Adding Gaussian mutation to SMA can improve the diversity of the population, and Levy flight can alleviate the local optimum of SMA, so that the algorithm can find the optimal solution as soon as possible. In order to verify the effectiveness of the proposed algorithm, a continuous version of the proposed algorithm, GLSMA, is tested on 33 classical continuous optimization problems. Then, on 14 high-dimensional gene datasets, the effectiveness of the proposed discrete version, namely, BGLSMA, is verified by comparing with other feature selection algorithm. Experimental results reveal that the performance of the continuous version of the algorithm is better than the original algorithm, and the defects of the original algorithm are alleviated. Besides, the discrete version of the algorithm has a higher classification accuracy when fewer features are selected. This proves that the improved algorithm has practical value in high-dimensional gene feature selection.

1. Introduction

With the development of modern social science and technology, a variety of problems have arisen in the society, requiring researchers to design more efficient and novel methods to put forward feasible solutions. In recent years, some metaheuristic algorithms have been developed to solve various optimization problems. Some studies also show that metaheuristic methods are more effective than traditional gradient-based methods [1]. Metaheuristic algorithms can be divided into several categories according to their causes: evolutionary algorithm (EAs), such as genetic algorithm (GA) [2] and differential evolution (DE) [3], and swarm intelligence algorithms (SI), such as particle swarm optimization (PSO) [4], Harris hawks algorithm (HHO) [5], RUNge Kutta optimizer (RUN) [6], hunger games search (HGS) [7], slime mould algorithm (SMA) [8], monarch butterfly optimization (MBO) [9], moth search algorithm (MSA) [10], colony predation algorithm (CPA) [11], and weighted mean of vectors (INFO) [12]. In addition, they have been widely used in various fields, such as solar cell parameter Identification [13], economic emission dispatch problem [14], image segmentation [15, 16], plant disease recognition [17], medical diagnosis [18, 19], scheduling problems [2022], optimization of machine learning model [23], multiobjective problem [24, 25], fault diagnosis [26], object tracking [27, 28], expensive optimization problems [29, 30], medical diagnosis [31, 32], combination optimization problems [33], feature selection [34, 35], practical engineering problems [36, 37], and robust optimization [38, 39].

Among all the algorithms, SMA is a new one proposed in recent years. Because of its excellent performance in dealing with complex problems and simple implementation, SMA has been widely applied in recent years. Because of its exploration and exploitation capabilities, it has been widely used in various fields to solve specific practical problems. For example, Kouadri et al. [40] proposed to use SMA in the actual power system to solve the optimal power problem and minimize the total cost of conventional and random power generation under the constraints of the power system. Khunkitti et al. [41] proposed the multiobjective optimal power flow (MOOPF) problem based on SMA, taking cost emission and transmission line loss as part of the objective function of the power system. Simulation results show that SMA has better solutions than other algorithms in the literature. Jafari-Asl et al. [42] proposed a method combining LS (line sampling) method with slime mould algorithm to solve the reliability problem under highly nonlinear and implicit limit states. Izci and Ekinci [43] evaluated the optimization ability of SMA by using a proportional integral derivative (PID) controller to adjust the speed of dc motor and maintain the terminal output of automatic voltage regulator (AVR) system and compared the performance of SMA with that of other controllers designed by competitive algorithms. Houssein et al. [44] proposed a multiobjective optimization algorithm based on SMA. The reliability of the proposed MOSMA was verified by the actual multiobjective optimization of automotive helical springs, and the effectiveness of MOSMA was evaluated by the Wilcoxon test and performance indicators. Houssein et al. [45] proposed a method combining SMA with adaptive guided differential evolution algorithm (AGDE) (SM-AGDE) to solve some of the defects of SMA. Gupta et al. [46] proposed a SMA to solve the estimation problem of proton exchange membrane fuel cell (PEMFC) model, which showed good performance in jumping out of local optimum, and the predicted results were basically consistent with the actual results. Therefore, SMA can be used for fuel cell problems. Elsayed et al. [47] used SMA to identify the parameters of transformer equivalent circuit and verified the ability and accuracy of SMA in parameter estimation of single-three-phase transformer, as well as its high performance and stability in determining the optimal parameters.

Hassan et al. [48] proposed an improved SMA (ISMA) to solve the problem of target and dual target economy and emission scheduling (EED) considering the valve point effect, in which the best solution was obtained by updating the position of the solution by using two equations in the sine and cosine algorithm. At the same time, on the basis of Pareto dominance concept and fuzzy decision, multiobjective ISMA is proposed, which has good performance and robustness. Jia et al. [49] optimized the SMA by introducing compound mutation strategy (CMS) and restart strategy (RS). CMS was used to increase population diversity, RS was used to avoid local optimization, and the effectiveness of the proposed CMSRSSMA was tested on the benchmark function. Meanwhile, the CMSRSSMA_SVM model was proposed and used for feature selection and parameter optimization. Altay [50] utilized 10 different chaotic mappings to generate chaotic rather than random values in SMA. By using chaotic mapping, the global convergence rate of SMA is improved and the local solution is avoided. Abdel-Basset et al. [51] integrated SMA and WOA algorithms to maximize Kapur’s entropy and applied them in the field of image segmentation, achieving good results. Chauhan et al. [52] proposed a method combining arithmetic optimizer algorithm (AOA) and slime mould algorithm (SMA), namely, HAOASMA algorithm, which solved the problems of slime mould algorithm’s insufficient memory and slow local convergence speed.

Since SMA was proposed, it has been applied in various fields and used to solve various problems, showing good performance. However, in the face of some complex optimization tasks, there are still problems of falling into local optimum and slow convergence. In order to cope with this situation and improve the performance of the algorithm, a combinatorial optimization method (GLSMA) based on Gaussian mutation and Levy flight is proposed in this paper. In GLSMA, the global exploration and local exploitation capabilities of the original algorithm are improved by introducing Gaussian mutation and Levy flight mechanism. In the optimization iteration process, the original position of slime moulds in the population was modified by Gaussian to enhance the diversity of the population and improve the global exploration ability of the algorithm, so that the algorithm could achieve a balance between global exploration and local exploitation. After that, Levy flight was used to improve the randomness of SMA and to jump out of local optimum. Benchmark function test results show that the improved version of GLSMA has better global search and local exploration ability compared with other advanced algorithms. The discrete version based on GLSMA also shows an ideal effect on feature selection.

The contributions and highlights of this paper are summarized below: (1)An improved slime mould algorithm (GLSMA) based on Gaussian mutation and Levy flight is proposed to solve continuous optimization problems and high-dimensional gene feature selection problems(2)The superiority of GLSMA is proved by comparing with several well-known algorithms on public datasets and achieved good results(3)Proposed binary GLSMA to solve high-dimensional gene feature selection problems(4)The developed GLSMA has faster exploration speed and convergence speed in the global optimization task(5)Binary GLSMA has the highest classification accuracy and the least number of features in high-dimensional gene feature selection task

The remainder of this paper is organized as follows: the second part introduces the original SMA. The third part introduces Gaussian mutation mechanism and Levy flight in detail, as well as the improved SMA based on the two mechanisms. The fourth part introduces a series of comparison experiments between GLSMA and other similar algorithms, including comparison experiments on continuous function and discrete function. The fifth part reviews and discusses of the proposed work. The sixth part summarizes the conclusions of this paper and gives several directions of future work.

2. Slime Mould Algorithm

Li et al. [8] established a mathematical model based on the oscillation behavior of slime moulds and thus proposed a metaheuristic slime mould algorithm (SMA).

The mathematical formula of slime moulds is shown in where represents a random value in the interval . Parameter ranges in the interval , which decreases with the number of iterations. Through the cooperative interaction between and , the selection behavior of slime moulds can be simulated and the optimal solution can be selected. indicates the maximum number of iterations. and represent the upper and lower boundaries of the search space, respectively. represents the position vector of the current highest fitness (highest concentration) individual. and represent the position vectors of random individuals selected from the slime moulds in the iteration. and are the random values between 0 and 1. Parameter is set to 0.03. and represent position vectors of slime moulds at the and iterations, respectively.

In addition, the decision parameter can be calculated as follows: where indicates the fitness of the ith individual in the slime mould, ; denotes the size of population; and represents the best fitness, which is attained during all of the iterations. where is the weight vector of slime moulds and and are the best and worst fitness obtained in the current iteration, respectively. and represent fitness ordering order (minimum problems are sorted in ascending order) and corresponding fitness values, respectively. represents the first half of .

3. Description of the GLSMA

3.1. Gaussian Mutation

Gaussian mutation (GM) operator is derived from Gaussian normal distribution, which is distinguished from Cauchy distribution. In the vertical direction, the Gaussian distribution is larger than the Cauchy distribution, and in the horizontal direction, the Gaussian distribution is smaller than the Cauchy distribution. Gauss mutations are more likely to produce new offspring in this part because of their narrow tail. In response, the search equation takes smaller steps to explore every corner of the search space in a better way. The Gaussian density function can be described as where is the variance of each member of the population. By setting the mean to 0 and the standard deviation to 1, this function is further simplified to generate an -dimensional random variable. The generated random variables were applied to the exploration stage of slime moulds, as shown below: where is a uniformly distributed random number derived from Gaussian distribution, is a position in SMA during the current iteration, and is the position corresponding to after Gaussian mutation. The introduction of Gaussian mutation mechanism enhances the diversity of population and improves the quality of SMA solution.

3.2. Levy Flight

Levy flight (LF) was first proposed by French mathematician Paul Levy in 1937, after which researchers used Levy statistics to describe various natural phenomena. Levy flight operator improves slime mould search capability by helping all search agents advance to better, more promising positions. A simple description of the Levy distribution is as follows: where is an important index of regulatory stability. Levy random numbers can be described by the following formula: where and are standard normal distributions, is a standard gamma function, is set to 1.5, and is defined as follows:

In the exploration phase of slime mould algorithm, Levy strategy was used to update the location of search agents, so as to better balance exploration and search capabilities. The update formula is as follows:

In the formula, is taken from the Levy distribution and is the number of random distribution. is the new location of the -th search agent after the update. The introduction of Levy flight can help all individuals to jump out of local optimum and improve the quality of the population.

3.3. Framework of Proposed GLSMA

In this section, we will describe GLSMA based on the Gaussian mutation mechanism and Levy flight strategy in detail. In the process of algorithm improvement, adding a mechanism can generally improve the algorithm in only one aspect but cannot improve the global exploration and local exploitation ability at the same time. By adding the Gaussian mutation mechanism, the corresponding value can be obtained from the current solution, but this can only improve the local exploitation ability and will fall into local optimal. The Levy flight mechanism can expand the search range of solutions, increase the possibility of obtaining the optimal solution, and avoid falling into local optimal. As a result, in the original SMA, two strategies (GM and LF) were introduced to facilitate the coordination of global exploration and local exploitation, forming a new SMA variant.

In the process of iterative optimization, Gaussian mutation was considered for individuals in the slime mould individuals after initial updating. The individuals obtained after mutation were compared with the individuals without that. If the fitness of the individuals in the mutation state was not improved, the original individuals were retained and the mutant individuals were discarded to ensure the quality of the population. Considering that the algorithm is easy to fall into local optimum, levy flight strategy is introduced to improve the randomness of SMA and the ability of jumping out of local optimum. The flowchart of GLSMA is shown in Figure 1. Experimental results show that compared with other swarm intelligence algorithms, GLSMA not only has stronger global exploration ability but also contributes to increase the quality of solutions and speed up convergence. The structure of the proposed GLSMA optimizer is shown in Algorithm 1.

Begin
 Initialize of the parameters: ,
 Initialize of slime mould population
While
  Calculate the fitness for each individual in slime mould
  Update and the best fitness
  Calculate the weight , a, and b according to Equations (2), (3), (5)
  For
   Update using Equation (4)
   Update and based on and , respectively
   Update the positions by Equation (1)
  End for
  Utilize the best individual in the population to perform GM operations
  If (meet the condition)
  Then use Levy flight to avoid falling into local optimality
  Iteration = iteration + 1
End while
 Return the best fitness and as the best solution
End
3.4. Computational Complexity Analysis

According to the structure of GLSMA, it mainly includes initialization, fitness evaluation, fitness ranking, weight updating, position updating based on SMA strategy, position adjustment based on Gaussian mutation mechanism, and position updating based on Levy flight strategy, where is the number of slime moulds, is the function’s dimension, and is the maximum number of iterations. The calculation is as follows:

The time complexity for initialization is . In evaluating and ordering fitness, the computational complexity is . The computational complexity of the update weight is . The computational complexity of position updating process based on SMA is . Similarly, the computational complexity of position updating process based on Gaussian mutation mechanism is . The computational complexity of the position update process based on Levy flight is . Therefore, the total computational complexity of GLSMA is .

4. Experiments and Results

In the experiment, to evaluate the continuous and discrete versions of GLSMA, the proposed SMA algorithm is compared with other optimizers on the continuous functions and feature problems, respectively. The effectiveness and competitiveness of the proposed algorithm are verified by two parts of experiments. In the first part, the strategies added on SMA were tested on 23 benchmark test functions (including 7 unimodal functions, 6 multimodal functions, and more than 10 fixed dimension multimodal functions) and 10 classic CEC2014 benchmark test functions (including 2 hybrid functions and 8 composition functions), to see whether the mechanism has a positive effect on the algorithm. Then, in the same test environment, GLSMA is compared with some original algorithms and advanced MA algorithms. In the second part, we compare the proposed binary GLSMA (BGLSMA) with other classifiers on feature selection problems.

All GLSMA experiments were written in the MATLAB R2014a compiler and run on Windows 10(64-bit) operating system. The computer hardware is Intel(R) Xeon(R) Silver 4110 CPU (2.40 GHz) 2.10 GHz (dual processors) and 32GB RAM.

In Section 4.1, we will test the influence of different mechanisms on the algorithm. In Section 4.2, GLSMA is compared with seven metaheuristic algorithms to prove its effectiveness. In Section 4.3, GLSMA is compared with eight advanced algorithms to verify its ability on exploration and exploitation. In Section 4.4, we use binary GLSMA (BGLSMA) to deal with feature selection in 14 UCI datasets.

4.1. The Influence of Gaussian Mutation and Levy Flight

As mentioned above, GLSMA consists of two main improved strategies: Gaussian mutation mechanism and Levy flight strategy. The purpose of this section is to validate the effectiveness of the combination of the two strategies. To this end, we compare GLSMA, SMA, and their variants GSMA and LSMA on 33 benchmark functions. GSMA only uses Gaussian mutation strategy, LSMA only uses Levy flight strategy, and SMA is the original algorithm.

All algorithm tests were carried out under the expected conditions to eliminate the influence of irrelevant factors on the experiment and ensure the fairness of the test. The population size was set to 30; the maximum evaluation test was uniformly set to 300,000. In order to weaken the influence of algorithm randomness on the experiment, we conducted 30 independence tests for each test case. In this paper, the average value of optimal function (Avg) and standard deviation (Std) of the selection algorithm results are compared. The global exploration ability and result quality of the algorithm were evaluated on the average (Avg), and Std of the optimal function was used to evaluate the robustness of the algorithm. In order to show the best results more clearly, all the best results are italicized.

In addition, nonparametric statistical verification Wilcoxon signed-rank test was used to measure the degree of improvement and whether it was statistically significant. The significance level was set at 0.05. The symbolic label “+/=/-” in the results states that the proposed method GLSMA is superior to, equal to, and inferior to other methods of competition, respectively. For a comprehensive statistical comparison, the Friedman [53] test was used to evaluate the average behavior of all different algorithms for further statistical comparison, and the average ranking was given in these comparison results, and the average rank value (ARV) of the Friedman test was used to evaluate the average performance of the compared methods.

Tables 14 contain 23 benchmark functions and 10 test functions in CEC2014. The selected 33 test functions include several different problems, covering unimodal function, multimodal function, fixed dimension multimodal function, hybrid function, and composition function. These test functions can be used to test the algorithm’s global exploration capabilities and local exploitation capabilities and can be used to verify the balance between exploration and exploitation capabilities.

As can be seen from the results in Tables 5 and 6, GLSMA is significantly superior to other mechanism combinations and the original SMA. After careful analysis, Avg and Std in Table 5 represent the superiority of GLSMA over F1-F7, F9-F14, F17-F18, and F22-F33 functions. On the test functions F1-F4, F9-F11, F26-F28, and F30-F33, the GLSMA’s Std value is 0, indicating that GLSMA has strong robustness. This is because the combination of Gaussian mutation and Levy flight mechanism improves the performance of the original SMA and can successfully find global optimal solutions for various complex problems. According to the statistical results of value in Table 6, many values of SMA column are less than 0.05, indicating that GLSMA has a certain improvement on the original SMA. As can be seen from the ARV tested by Friedman in Table 7, when comparing the four algorithms, GLSMA ranks first and is significantly superior to other algorithms. Moreover, it can be seen that the improvement effect of Gaussian mutation mechanism or Levy flight mechanism on the original SMA is not good, even can be said to be poor, but the combination of the two can achieve a good balance between exploration and exploitation, so as to achieve a good effect. In summary, the results show that the addition of Gaussian mutation mechanism and Levy flight mechanism is not only beneficial to the exploration and exploitation ability of GLSMA but also beneficial to the balance between the exploitation and exploration ability of GLSMA, which has a certain positive effect on the algorithm and improves the robustness of the original algorithm, which has improved significance.

Compared with the table, the image can more intuitively and clearly reflect the optimization results of GLSMA compared with other comparison objects. Figure 2 shows the convergence curves of the four comparison methods on nine functions. It is obvious that GLSMA using two mechanisms achieves better results than its variants. The combination of Gaussian mutation and Levy flight enables GLSMA to escape from local traps faster and obtain the global optimal solution. In the meantime, it can be seen that GLSMA has the fastest rate of convergence and can get the optimal value first. The results show that this combination of mechanisms can quicken the convergence of the algorithm while jumping out of local optimum. In general, the combination of GM and LF improves the overall performance of the original SMA.

4.2. Comparison with Well-Known Algorithms

In this experiment, 23 classical functions and 10 of the CEC2014 benchmark functions were selected to evaluate the performance of GLSMA. The 33 benchmark test functions used in all experiments of continuous optimization can be divided into four categories: unimodal function, multimodal function, hybrid function, and composition function. The unimodal function (F1-F7) has only one solution, which can be used to test the development ability of the algorithm. The multimodal function (F8-F23) has several local optimal solutions and is suitable for verifying the exploration ability of the algorithm. The hybrid function and composition function (F24-F33) selected from CEC2014 are used to verify the balance between algorithm exploration and exploitation. These functions are often used to assess the overall power of algorithms. In this experiment, the performance of the improved GLSMA was compared with PSO [4], WOA [54], GWO [55], SCA [56], FOA [57], DE [3], and SSA [58].

Tables 810 record the comparison results of GLSMA with seven well-known algorithms. The comparison results are shown in Table 10; among GLSMA and seven famous algorithms, the average Friedman test result of GLSMA is 2.328283, ranking first, and the average Friedman test result of DE is 2.730303, ranking second. It is obvious that the Friedman test of GLSMA and DE is obviously better than other algorithms. The average value (Avg) and standard deviation (Std) of optimal solution of GLSMA and other well-known algorithms are shown in Table 8. GLSMA has a significant advantage. Moreover, in all the comparison algorithms, GLSMA has Std 0 on more test functions, which proves that GLSMA algorithm has stronger stability. In addition, GLSMA shows obvious advantages and stability in almost all of the composition functions (F26-F28 and F30-F33). Table 9 shows the Wilcoxon symbol test results between GLSMA and other well-known algorithms. It can be seen that the value of GLSMA is less than 0.05 on almost all benchmark functions, which proves that GLSMA is significantly better than other algorithms, especially FOA, in all functions. Therefore, compared with these basic metaheuristic algorithms, GLSMA has statistical significance.

From the convergence curves of 8 algorithms on 9 functions shown in Figure 3, it can be seen that GLSMA improves the global search ability under the dual mechanism and can quickly escape from the local optimal trap and faster to find the global optimal.

In conclusion, compared with other well-known algorithms, GLSMA shows good overall superiority and stability. The strategy combination of Gaussian mutation and Levy flight enables the proposed GLSMA to obtain higher quality solutions in the optimization process, thus achieving a balance between exploration and exploitation.

4.3. Comparison with Advanced Algorithms

In this experiment, the proposed GLSMA algorithm is compared with 8 classical advanced algorithms in order to fully prove its global search and avoiding local optimality, including MPEDE [59], LSHADE [60], ALCPSO [61], CLPSO [62], CMAES [63], BMWOA [64], CESCA [65], and IGWO [66]. These include two classic DE variants, two superior PSO variants, and variations of WOA and GWO algorithms.

The comparison results of GLSMA with eight advanced algorithms are shown in Tables 1113. Table 11 shows the average value and standard deviation of the optimal solution obtained by GLSMA and advanced algorithms. As can be seen, compared to other algorithms, GLSMA shows good superiority and stability in F1-F5, F7-F11, F13, F15, F21, F26-F28, and F30-33 functions. Table 12 shows the Wilcoxon rank test result’s value among GLSMA and eight advanced algorithms. From the table values, it can be seen that GLSMA outperforms other comparison algorithms on most benchmark functions. GLSMA is superior to CESCA in all functions. Therefore, GLSMA is obviously competitive with other excellent algorithms. Table 13 shows the comparison results; among GLSMA and other 8 advanced algorithms, the average Friedman test result of GLSMA ranks the first, which is 3.629293.

The convergence curves of all nine algorithms on nine functions shown in Figure 4 show that GLSMA’s convergence speed is faster than other advanced algorithms, and it can jump out of local optimum faster and avoid falling into local optimum better than other algorithms.

In summary, the introduction of Gaussian mutation mechanism and Levy flight mechanism gives GLSMA an advantage over competitive advanced algorithms, showing superior performance in different types of functions. GLSMA not only has stronger global search ability but also can avoid falling into local optimum.

4.4. The Experiments for Feature Selection

In this section, we transform the proposed algorithm GLSMA into a discrete version, namely, BGLSMA, which is applied to feature selection problems of high-dimensional gene data, thus making the proposed algorithm more realistic.

The purpose of feature selection problem is to remove some redundant and irrelevant features from the sample, so as to reduce the complexity of feature selection problem, reduce the subsequent calculation cost, and obtain higher classification accuracy. In the process of feature selection, it is necessary to determine which features should be selected. As a result, we transform continuous GLSMA into discrete GLSMA, namely, BGLSMA. The proposed GLSMA increases the population diversity, strengths the local exploitation ability, and helps us select favorable features in the search space, so as to obtain better feature subsets and improve classification accuracy.

4.4.1. Binary GLSMA

In feature selection algorithm based on GLSMA, represents a subset of features. In BGLSMA, if , this feature is selected, and conversely, if , this feature is discarded. In order to solve discretization problems, GLSMA needs to be discretized. The individual with binary position vector is initialized by random threshold, and then, the discretization of position can be expressed as

In the formula above, indicates the value of the -th dimension of the agent individual position searched in the discrete space, and means a random number within the range of [0, 1]. means converting the value of the -th dimension of in the continuous motion space to 0 or 1, thus realizing the discretization of the continuous space. The transformation of does not change the structure of the algorithm.

As described above, feature selection is a process of using the least gene subset to obtain the optimal classification accuracy, that is, to improve the classification accuracy and reduce the number of features. This problem is described as a combinatorial optimization problem. In order to satisfy each objective, a linear combination of feature number and error rate is used to define fitness function, and the candidate solutions are evaluated comprehensively.

In the above formula, is the classification accuracy rate of KNN classifier, the length of selected feature subset is represented by , and the total number of features in the dataset is represented by . are the weights of classification error rate and feature reduction, respectively. Compared with feature reduction, more attention is paid to accuracy; we set to 0.95 and the latter to 0.05.

4.4.2. Simulation Experiments

In this experiment, the resulting BGLSMA is compared with other excellent metaheuristic optimizers on 14 UCI feature selection datasets. In Table 14, the details of these datasets are shown, including the number of samples, the number of features, and the number of categories. Table 14 shows that, in these datasets, the sample number is 50-308, the feature number is 2000-15010, and the number of categories is 2-11. These datasets contain several different types of data. These high-dimensional gene datasets have such characteristics: the sample number is small, and the feature number is thousands, which has some impact on data dimension reduction.

In order to select fewer features while maintaining classification accuracy, K-nearest neighbor (KNN) [67] algorithm is used for data classification. K-nearest neighbor is a nonparametric regression statistical method with wide application in classification problems. The steps of KNN algorithm are as follows: firstly, the original data is preprocessed, and the processed data is divided into training set and test set; second, set the appropriate parameter to 1. Then, the initial group is selected in the training set, and the distance between the initial group and the test group is calculated. The distance calculation formula is shown in Equation (15). At the same time, calculate the distance between the training group and the test group, and compare whether is less than . If is less than , repeat the above steps until the termination condition is reached.

Metaheuristic classifiers used for comparison include bGWO [68], BBA [69], BGSA [70], BPSO [71], bALO [72], BSSA [73], and, the binary form of the original SMA, BSMA. The above classifiers are used for feature selection, and the relevant data of feature subsets found by various algorithms in the search process are learned in KNN classifier, and the corresponding result information is finally output for comparison. In order to reduce the influence of random factors, 10-fold cross validation was adopted, and the average value of multiple cross experimental results was taken as the result to evaluate the algorithm’s accuracy.

Tables 1518 list the statistical results of the average number of selected features, the average error rate, the average fitness, and the average calculation time. According to the average number of selected features in Table 15, the proposed BGLSMA has the least average number of selected features in all datasets except Tumors_11 and Tumors_14. On Colon and Leukemia datasets, BGLSMA obtained a small selected feature with a standard deviation less than 1. This proves that GLSMA can obtain fewer features and higher classification accuracy. As can be seen from the tables, BGLSMA has higher classification accuracy in processing some complex high-dimensional data and can find smaller number of features and reduce the data scale. In terms of ARV index, BGLSMA ranks first. This suggests that BGLSMA can obtain very competitive results in terms of the number of features selected.

Table 16 shows the comparison results of eight algorithms in terms of average error rate. It can be seen from the ranking that BGLSMA has the lowest average error rate, which proves that the proposed algorithm not only has better performance in global optimization problems but also has good classification ability in feature selection optimization. It can be seen that the average error rate of BGLSMA is significantly lower than that of BSMA. The Gaussian mutation mechanism enables the population to search a larger space. As the number of iterations increases, the most representative features in each dataset are gradually selected, and the classification accuracy is also improved.

It is clear from the key measurements listed in Table 17, namely, the weighted number of features and the weighted error rate, that BGLSMA outperformed other competitors on 78.6% of the dataset. In addition, both the detailed data and the final ARV value show that BGLSMA has greatly improved compared with BSMA, which is due to the introduction of Levy flight mechanism, which increases the diversity and randomness of the population and selects features from a wider range of features, thus achieving higher classification accuracy.

Table 18 shows the average calculation time results of algorithm comparison. The computation cost of BGLSMA optimizer proposed in this paper is higher than that of BBA, BGSA, and other optimizers, and the time complexity of the BSMA and bGWO with better performance is also higher than that of other optimizers, as shown in Tables 1518. The introduction of GM and LF strategies not only improves the performance of BGLSMA but also increases the cost of computing time. Meanwhile, the time cost of the original SMA is higher than that of other algorithms, which leads to the high time cost of BGLSMA to a certain extent.

To sum up, BGLSMA is found to be the best optimizer in the overall comparison with other optimizers. Although the time cost is relatively high, BGLSMA can select the optimal feature subset on the vast majority of high-dimensional gene datasets without losing meaningful features and achieve the best fitness and classification error rate at the same time. The experimental results show that the combined strategy of Gaussian mutation and Levy flight guarantees the good results of GLSMA in global exploration.

5. Discussions

In this part, the GLSMA algorithm proposed in this paper, its advantages, and the points that can be improved are discussed. In the original SMA, the slime mould is not able to find the optimal solution in the search space, and it will fall into the local optimum when encountering some problems, which limits the use of the algorithm. In this paper, Gaussian mutation and Levy flight are introduced to update the population, which can enhance the global exploration ability and avoid the algorithm falling into local optimum. Experimental results show that the optimization effect of the dual mechanism is better than that of the single mechanism, and the optimization effect of GLSMA is better than some advanced optimization algorithms.

We monitor the situation after the population updates its optimal fitness value to determine whether it falls into local optimum. If it falls into local optimum, then Levy flight mechanism is invoked to help the algorithm increase the search space and jump out of local trap. The combination of the dual mechanism is significantly better than the single mechanism. However, it can be seen from Table 18 that the time cost of BGLSMA is relatively high, which is partly due to the high time cost of BSMA and partly due to the addition of mechanism, which leads to the increase of time cost. Correspondingly, the mechanism greatly improves the performance of the algorithm, allowing it to be applied to more domains, such as human activity recognition [74], microgrid planning [75], medical image augmentation [76], autism spectrum disorder classification [77], disease prediction [78, 79], named entity recognition [80], information retrieval services [8183], and recommender systems [8487].

6. Conclusions and Future Directions

In this paper, an improved SMA (GLSMA) algorithm based on Gaussian mutation and Levy flight is proposed. Experimental results show that the two mechanisms play an important role in further enhancing the global search of SMA and alleviating falling into local optimum. Firstly, the effectiveness of GLSMA method is verified by comparison with DE, PSO, GWO, and other well-known algorithms. Secondly, compared with other advanced swarm intelligence algorithms, such as MPEDE, LSHADE, ALCPSO, and CLPSO, GLSMA is able to find the optimal solution faster. Finally, in order to prove the performance of GLSMA in practical applications, BGLSMA is obtained by mapping GLSMA into binary space through transformation function, and it is applied to feature selection problems of 14 commonly used UCI high-dimensional gene datasets. Compared with excellent metaheuristic optimizer, general average characteristics selected number, average error rate, and average fitness and calculated the cost four aspects; it can be seen that GLSMA in the application of feature selection still has good global search ability and be able to select fewer features and higher classification accuracy. Therefore, the above conclusions indicate that GLSMA can be a promising method for not only function optimization problems but also practical feature selection problems.

There are still many aspects to explore in our research. We can consider applying GLSMA to other feature selection datasets and study the effectiveness of BGLSMA on other datasets. Further improvements to the SMA can be attempted to improve the balance between global exploration and local development. Finally, it is an interesting topic to apply SMA to more fields, such as photovoltaic parameter optimization and image segmentation (see Tables 518).

Data Availability

The data involved in this study are all public data, which can be downloaded through public channels.

Conflicts of Interest

The authors declare that they have no conflicts of interest.