#### Abstract

Artificial neural networks have achieved a great success in simulating the information processing mechanism and process of neuron supervised learning, such as classification. However, traditional artificial neurons still have many problems such as slow and difficult training. This paper proposes a new dendrite neuron model (DNM), which combines metaheuristic algorithm and dendrite neuron model effectively. Eight learning algorithms including traditional backpropagation, classic evolutionary algorithms such as biogeography-based optimization, particle swarm optimization, genetic algorithm, population-based incremental learning, competitive swarm optimization, differential evolution, and state-of-the-art jSO algorithm are used for training of dendritic neuron model. The optimal combination of user-defined parameters of model has been systemically investigated, and four different datasets involving classification problem are investigated using proposed DNM. Compared with common machine learning methods such as decision tree, support vector machine, *k*-nearest neighbor, and artificial neural networks, dendritic neuron model trained by biogeography-based optimization has significant advantages. It has the characteristics of simple structure and low cost and can be used as a neuron model to solve practical problems with a high precision.

#### 1. Introduction

In the human brain, about tens of billions of interconnected neurons transmit signals through synapses to form a complex neural network to guide human behavior in the network. Neurons are composed of cell bodies with branched dendritic structures, cell membranes, and axons responsible for transmitting nerve signals. The first truly dominant concept of neural network established by scholars has only one neuron unit, known as binary McCulloch-Pitts neuron, which was proposed by McCulloch and Pitts in 1943 [1]. However, it does not consider the nonlinear transmission of cellular signals in dendritic neuron networks. Moreover, as the single-layer McCulloch Pitts neuron model cannot solve the basic nonlinear operation problem [2], it is also criticized as too simple.

Traditional neural networks generally believe that the connection between neurons is very complex, and the brain has strong computing and thinking ability. A single neuron does not need strong computing ability and only needs simple linear summation or nonlinear threshold operation in the process of signal transmission. As a result, the computational potential of individual neurons and their dendrites has been neglected for a long time.

Some researchers have proposed that dendrites perform more complex nonlinear operations in the process of signal transmission, which can improve the computing power of a single neuron [3, 4]. Koch et al. [5] hypothesized that the synaptic interaction at the branch turning point can be realized by Boolean logic operation, which means that the dendritic branch point is responsible for summarizing the current signals from the dendritic branch, its output is the input logic or, and each branch performs logic AND operation on their synaptic input. However, as small differences in individual neuron morphology can lead to great changes in function, Koch model is difficult to distinguish different synaptic and dendritic morphology to solve specific complex problems. Therefore, the structure of synapses and dendrites requires a plasticity mechanism. Subsequent studies have found the phenomenon of neuronal plasticity, and an important progress in neuronal cell structure has been achieved by proposing neuronal pruning method [6, 7].

Compared with the widely used neural network model under the current mainstream view, the single dendritic neuron network model can deal with more complex nonlinear operations. The dendritic neural network model can carry out more complex nonlinear operation and obtain more accurate results with the same number of neurons. At present, the dendritic neural network model has been applied to many fields and achieved good results, such as live disorders [8], financial time series prediction [9, 10], and breast cancer classification [11–24].

In recent years, the research method of combining metaheuristic algorithm and neural network is more and more widely used. Its basic idea is to use metaheuristic algorithm to continuously adjust the corresponding parameters in neural network, guide the output after obtaining the optimal value, and then replace the obtained parameters into neural network for classification and prediction [25, 26]. The traditional dendritic neural network is optimized by backpropagation algorithm. The backpropagation algorithm is based on the chain derivation rule, which has the problems of falling into local traps, gradient disappearance, and so on [27–29]. This paper proposes a new dendrite neuron model (DNM), which combines metaheuristic algorithm and dendrite neuron model effectively.

Seven different metaheuristic algorithms are compared in the experiment, each of which has its own characteristics. Genetic algorithm (GA) [30–32] is an algorithm of finding the optimal solution based on the simulation of natural selection and genetic mechanism of biological evolution, whose main characteristic is to directly operate on the structure object and free from the restriction of derivation and function continuity. Biogeography-based optimization (BBO) [33–35] has been widely applied to simulate ecological concepts, well-known as its high prevision and strong stability using the representative metaheuristics. Particle swarm optimization (PSO) [36, 37] has been applied to train neural network instead of BP, whose whole searching and updating process follows the current optimal solution. Unlike genetic algorithm, all particles may converge to the optimal solution faster in most cases, and its advantage of evolutionary computation can deal with some problems of nondifferentiable node transfer function or no gradient information. Competitive swarm optimizer (CSO) [38, 39] is a simplified metaheuristic method and, which as a variant of PSO, is not only suitable for multi-point search, but also for local search. On this basis, the competition mechanism is applied, and it is not necessary to update the individual and global optimal value of position. Thus, CSO can balance the local-minimum trapping and convergence rate. Population-based incremental learning (PBIL) [40–42] selects the individuals with the highest fitness in each generation of the group to modify the learning probability and guides the generation of new individuals. Differential evolution (DE) [43, 44] is a stochastic model simulating biological evolution. As with other evolutionary algorithms, DE remains a global search strategy based on population, and for a further step, the process of genetic operation is simplified using real encoding, simple mutation operator, and competitive optimization mechanism. jSO algorithm is a new variant of DE algorithm, which is a state-of-the-art algorithm for single objective real-parameter optimization [45], and we, for the first time, introduce it to learn DNM.

The learning algorithm using BP is called DNM + BP, and DNM + BBO, DNM + PSO, DNM + GA, DNM + PBIL, DNM + CSO, DNM + DE, and DNM + jSO are similarly named. The effectiveness, accuracy, and convergence of each algorithm in classification problems are explored and demonstrated using four datasets. The experimental results show that DNM + BBO has fast convergence and high accuracy. At the same time, the classification accuracy of decision tree, KNN, support vector machine (SVM), MLP, and DNM + BBO is compared, and the results show that DNM + BBO has the highest classification accuracy. This paper effectively combines metaheuristic algorithm with dendritic neuron model to establish DNM + BBO, which provides an effective method to solve the classification problem.

#### 2. Model and Learning Algorithms

##### 2.1. Dendritic Neuron Model

The DNM is composed of four layers based on dendrite structure. In the synaptic layer, inputs *x*_{1}, *x*_{2}, …, *x*_{n} of each dendrite are firstly transformed using a sigmoid function. Secondly, in the dendrite layer, the outputs of the first layer are transmitted to a function of multiplication. Thirdly, membrane layer processes the received inputs from the dendrite layer. Finally, the signal from the membrane layer is transformed using another sigmoid function to accomplish the whole process [46]. Figure 1 shows the complete structure of DNM, and below are the details of this model.

###### 2.1.1. Synaptic Layer

A synapse is the connection between neurons. Statistics flow from a synaptic neuron to another, which exhibits a feedforward pattern. The synapse has four connection states, namely, the excitatory connection, the inhibitory connection, constant 0 connection, and constant 1 connection. It depends on changes in the potential of the accepting neuron arising from ionotropic phenomena. The connecting function from the *i*th (*i* = 1, 2, …, *n*) synaptic input to the *j*th (*j* = 1, 2, …, *m*) synaptic layer is described as follows:

Equation (1) expresses the transfer function *Y*_{ij} of the *i*th input of a synapse *x*_{i} varying from 0 to 1. *k* is constant. and *θ*_{ij} respectively denote the weight and threshold in synapse.

As for the values of and *θ*_{ij}, there are four different connections discussed in Figure 2. The *X*-axis indicates the input of the DNM, and the *Y*-axis indicates the output of the synaptic layer. Since the value of input *x* is from 0 to 1, only the blank part of each illustration is required to be focused on. The four connections include the following: Figure 2(a) presents excitatory connection, and when 0 < *θ*_{ij} < , the output is proportional to the input. On the contrary, Figure 2(b) depicts the inhibitory connection. As for < *θ*_{ij} < 0, the output is inversely proportional to the input. Figures 2(c) and 2(d) present constant 1 connection, and when *θ*_{ij} < 0 < or *θ*_{ij} < < 0, regardless of the value of the input, *x* varies between 0 and 1, and the output is always 1; Figures 2(e) and 2(f) present constant 0 connection, and when < 0 < *θ*_{ij} or 0 < < *θ*_{ij}, regardless of the value of the input, *x* varies between 0 and 1, and the output is always 0.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

###### 2.1.2. Dendrite Layer

In this layer, outputs from the synapses are multiplicated altogether. As a method of describing nonlinearity features, multiplication is the first selection due to its simplicity. In addition, if we take constant 0 or 1 connection as an example, this function is equivalent to the logical AND operator for their similar output values. The output formula for the *j*th dendrite is as follows:

###### 2.1.3. Membrane Layer

This layer represents the summary of signals coming from each dendritic branch. The input of the next layer is obtained by a sum function, which resembles a logical OR operator. Then, the processed signal will be transmitted to the soma body. Thus, the output of the membrane layer is as follows:

###### 2.1.4. Soma Layer

Finally, the received signal in the soma layer is taken as the input of another sigmoid function. The detailed formula for this layer is as follows:where *k*_{s} is a positive constant, and the range of threshold *θ*_{s} is [0, 1].

##### 2.2. Backpropagation

Backpropagation (BP) is the gradient descent method [47, 48]. This algorithm contributes to reducing the error between the target output and its real value through neural network training. The error can be expressed as follows:where *T* is the target output vector, and *O* is the actual output vector. By modifying the parameters and *θ*_{ij} of the DNM model in the process of learning, the error can be decreased. The updated expressions are set as follows:where *E*_{p} is the mean square error. After computing these two increments, we can get values of and *θ*_{ij} at the next moment through the following:where *η* denotes the learning rate defined by users. *t* is a characterization of learning times. Furthermore, the partial differentials of *E*_{p} regarding and *θ*_{ij} are calculated as follows:

The detailed results of the above partial differentials in DNM are given as follows:

When computing ∆ (*t*) and ∆*θ*_{ij}(*t*), the chain rule is applied, and layer-by-layer calculation is performed throughout the DNM.

##### 2.3. Biogeography-Based Optimization

Biogeography-based optimization derives from biogeography, which investigates the speciation, extinction, and geographical distribution in nature. Each habitat is regarded as a solution and the principal task is to obtain the best one. For convenience, the model introduces mathematics, using high habitat suitability index (HSI) as the degree of fitness among species. And the suitability index variables (SIV) are utilized for describing various aspects of HSI [49]. Procedures using BBO are implemented as follows:(1)Initializing the integer sequence SIV based on the current habitat *H*_{i} (*i* = 1, 2, …, *n*).(2)Calculating the HSI of each habitat according to the following formula. where *P* represents the total number of training samples. *T*_{p} is the target vector of the *p*th sample, and *O*_{p} is the actual output vector determined by *H*_{i}.(3)Implementing random selection on the SIV and migration among habitats occurs in the case that the emigration rate and immigration rate are *μ*_{i} and *λ*_{i} respectively. where *E* is the emigration rate, *I* is the maximum immigration rate, and *m* is the rank of habitat. These two parameters are set as *E* = *I* = 1 in this research, and *λ* and *μ* are constrained as follows:(4)For each habitat *H*_{i}, the immigrated his, and the probability *Ps*_{i}, it contains the *S*th species of habitat that are updated: If *t* is small enough to be considered as 0, the following equation can be approximated:(5)Mutating nonelite habitats according to the mutation rate *Pm*_{i}: where *Ps*_{max} is the maximum value of *Ps*_{i}, and *Pm*_{max} is the parameter.(6)Go to Step 2 again and perform the next iteration if needed. Not until the termination criterion is met does this procedure end.

##### 2.4. Particle Swarm Optimization

Particle swarm optimization mimics the search behavior of a flock of birds and consists of particles, each of which represents a possible solution [50]. The solution includes two attributes: speed and position. The former indicates moving rate of each particle, and the latter indicates its moving direction. Each particle moves to the optimal value separately and spontaneously and memorizes the current value for the particle itself (*pbest*). Then, it shares the individual optimal solution with other particles and obtains the global extreme value (*gbest*). All particles update their two attributes according to these two extreme values. PSO is widely adopted as its simple operating process, for instance, MLP [51]. The whole process of PSO being used to search for the optimal values of weights and thresholds in the synaptic layer of DNM can be described as follows:where *X*_{i} (*i* = 1, 2, …, *Q*) indicates the *i*th individual in the swarm, and *Q* denotes the number of particles. Besides, we use the mean square error (MSE) (*X*_{i}) to calculate the error of the last layer with output *X*_{i} as follows:

##### 2.5. Genetic Algorithm

Genetic Algorithm is inspired by natural selection. According to the previous study [30], a set of probable solutions is meant as individuals in optimization study. Good individuals tend to reproduce at a relatively high rate, while poor individuals have a relatively low reproductive rate. With the evolution of population, individuals develop in a healthier direction. However, as GA is a random searching algorithm, chances are that worse individuals are generated from fitter ones in the existing framework. Thus, we adopt the elite strategy to maintain optimized individuals in this scheme. Training DNM can also use GA. Similar to PSO, a chromosome in GA for training DNM can be exhibited using equation (16), where *X*_{i} (*i* = 1, 2, …, *Q*) indicates the *i*th chromosome in the population, and *Q* represents the population size. We use single point crossover to update individuals. Moreover, the fitness function is the same as equation (17).

##### 2.6. Population-Based Incremental Learning

Estimation of distribution algorithm (EDA) is a population-based strategy that tracks the statistical information of the candidate solution population for optimization [30, 41, 52]. It uses a solution of discarding at least a part of the population in each generation and using a sample according to the statistic quantity of high-fitness individuals in the current population to generate new populations, and the process is repeated from generation to generation.

Similar to EDA, PBIL is an extension of a univariate marginal distribution algorithm. When it functions as the optimizer of an *m*-dimensional binary problem, PBIL uses a probability vector *p* with *m* dimensions, whose *k*th value of *p* describes how likely this element is equal to one. In each generation, a random population is generated using the probability vector *p* probabilistically. Then, the fitness of each candidate solution is calculated. By adjusting the probability vector, the next generation is more likely to resemble the most suitable individual. After getting this new probability vector, using *p* to create another candidate solution to the random population, continue this process until the termination requirement is met.

##### 2.7. Competitive Swarm Optimization

Competitive swarm optimization is a population algorithm used to solve large-scale classification problems, and the velocity of individual movement is not eliminated, same as PSO. It introduces a competitive strategy to make a comparison between two selected particles according to their evaluated results. As only the lost particles can learn to participate in iterating, except for reducing the number of updated particles to 2/*N* [53], it is not necessary to save the excellent solution in search, which can be applied to the effective solution of large-scale classification problems. Below are the operation steps:(1)*N* represents the solution at the beginning, and the particle position *x*_{i} (*i* = 1, 2, …, *N*) and velocity (*i* = 1, 2, …, *N*) of generated particles are initialized.(2)All solutions are evaluated.(3)The *k*th (*k* = 1, 2, …, 2/*N*) competition for generation *t* occurs as follows:(a)Nonrepeating particles *N*_{k1} and *N*_{k2} are selected from the undecided particles randomly.(b)Positions of selected particles of *N*_{k1} and *N*_{k2} are compared and evaluated to determine the won particle and the failed particle.(c)The *x*_{l,k} of failed particle is updated by the application of its velocity . where *R*_{1}*(k, t)*, *R*_{2}*(k, t)*, and *R*_{3}*(k, t)* are vectors with their elements varying from 0 to 1. represents the mean position of all particles, and *φ* represents at which degree the influence of the mean quantity takes effect, which has been advised as the following equation in reference to the existing researches:(d)Repeat the three steps above until all particles are identified.(4)The next iteration starting with step 2 is operated until the parameter *t* reaches the maximum.

##### 2.8. Differential Evolution

Differential evolution is a random search method inspired by biological evolution, and highly fit individuals are preserved through iterations. As a variant of genetic algorithm, it is a global search strategy based on population and adopts real coding, basic mutation from one-to-one difference to simplify the genetic operation. Furthermore, its memory ability enables DE to track the real-time situation and modify the search strategy dynamically. Owing to the prominent global convergence ability and stability, DE is well applied to the solution of complicated optimization problems, which are difficult to resolve using traditional mathematical programming methods. At present, DE has been used in artificial neural network, signal processing, biological information, and other fields.

##### 2.9. jSO

jSO is the latest improved algorithm of differential evolution, which is based on iL-SHADE algorithm [54]. It keeps parameter strategy based on historical memory and linear population size reduction strategy of iL-SHADE. jSO adopts adaptive strategy to improve the mutation coefficient *M*_{F} and crossover probability *M*_{CR}, which are the key parameters of differential evolution algorithm, and the effect is obvious. The mutation strategy of jSO is as follows: where *nfes* is the current population iteration, and *max_nfes* is the max population iteration.

#### 3. Experiment

In this experiment, four classification problems are used to verify the performance of DNM with the eight learning algorithms mentioned above. Table 1 shows their attributes, number of training samples, number of test samples, and number of classes. The classification datasets are acquired from the open datasets of the UCI Machine Learning Repository in various aspects [55].

For each learning algorithm, the maximum generation number is set as 1000. Each data set includes two parts, with learning data accounting for 70% and testing data accounting for 30%. In addition, the characteristics of any classification problem are expressed by numbers with no data error that contain negative numbers and decimal numbers. As the input *x* of DNM varies from 0 to 1, each characteristic data set is normalized in the corresponding range for the experiment.

All the experimental results are averaged from 30 independent experiments, and the accuracy of the expected output is calculated according to the classification results. Equation (22) is used to calculate the accuracy with true positivity (TP), false positive (FP), true negative (TN), and false negative (FN). Besides, MSE is utilized as the evaluation function using equation (17) for each learning algorithm.

As for the experimental equipment and operating rate, the experimental environment is shown in Table 2. The design of experiment adopts a statistical strategy for the effective analysis of large combinations using orthogonal arrays based on Latin square. The mentioned eight learning algorithms are tested under above situations, and owing to the adoption of orthogonal arrays, the number of experiments can be greatly reduced by the relationship between the factors and levels.

From the previous research, the performance of the learning algorithm can be significantly enhanced by carefully selecting parameter values in some preliminary experiments [56–58]. According to experience and algorithm characteristics, Table 3 shows parameters setting for each algorithm. The population size is set to 50, and the maximum number of generations of each algorithm is 1000 uniformly, while other parameters are set empirically according to characteristics of each algorithm.

For obtaining the optimal performance of DNM, user-defined parameters are well worth investigating. There are four key parameters in DNM, that is, the number of dendrites in the model (*M*), the synaptic parameter in the connecting sigmoid function (*k*), and two soma parameters (*k*_{s} and *θ*_{s}) in the output sigmoid function.

The reasonable combination of four DNM parameters is obtained by Taguchi method [59–61]. It scans a portion of possible combinations among factors rather than the whole combination, resulting in minimal experimental runs and optimal estimation of factors during execution [62, 63]. Referring to relevant previous study and research experience, the hierarchy number of four factors is set as follows: five levels for *M* ∈ {3, 5, 10, 15, 20}; five levels for *k* ∈ {1, 5, 10, 15, 20}; five levels for *k*_{s} ∈ { 1, 5, 10, 15, 25}; and five levels for *θ*_{s} ∈ {0.1, 0.3, 0.5, 0.7, 0.9}. Different from full factor analysis, which requires 5^{4} = 625 trials, the orthogonal array can obviously reduce the number of experiments and time cost. Hence, the orthogonal array L_{25}(5^{4}), which involves only 25 experiments, is utilized in this time.

The supplementary material (available here) summarizes the experimental results, where MSE represents the mean square error values of the eight learning algorithms (i.e., BP, BBO, PSO, GA, PBIL, CSO, DE, and jSO) for four datasets. According to the experimental results of each dataset, we obtained acceptable user-defined parameter settings respectively, as shown in Table 4.

#### 4. Result

The average accuracy and standard deviation of learning algorithms in each dataset are summarized in Table 5. It is obvious that regardless of which dataset, DNM + BBO achieves the highest accuracy among all the comparison object, and some even reach 100%, while DNM + BP is the lowest. Otherwise, the accuracy of DNM + CSO is also higher, but inferior to that of DNM + BBO and higher than that of DNM + PSO. Moreover, the accuracy of DNM + GA, DNM + jSO, DNM + PBIL, and DNM + DE with little difference is relatively common.

Furthermore, the upper limit of *m* is set as 20 in this experiment, but both optimum parameters *m* of DNM + BBO and DNM + CSO are 15, three quarters of the upper limit, with a high accuracy of classification as shown in Table 5. As a result, for classification problems, DNM learning by metaheuristics may not require an extremely large number of *m* for problems with a small number of features.

Table 6 shows the accuracy comparison results of DNM + BBO and other common machine learning classification methods on four datasets, and DNM + BBO has obvious advantages. Using the Machine Learning Toolbox of MATLAB to realize decision tree, SVM, and KNN. Table 7 shows the average running time (sec) of each algorithm in four datasets. BP runs the fastest with the worst classification performance. Among metaheuristic algorithms, CSO is the best, and BBO is the second, while DE and jSO have bad performance.

Figure 3 presents the average convergence graph of each learning algorithm for the datasets of banknote authentication, breast cancer, car evaluation, and diabetic retinopathy, respectively. It is evident that the value of each learning algorithm converges to the final iteration time, but BBO converges the fastest in all cases. The multipoint search BBO maintains an elite habitat, changing the solution in each iteration, which produces a higher number of new candidate optimal solutions than any other learning algorithms.

By deriving a new solution from a candidate optimal solution at a certain time, the convergence rate of the high-quality solution can be accelerated. As a result, BBO has basically converged to the minimum value of MSE even in the 200 iterations in case of banknote authentication and car evaluation.

On the contrary, the MSE of BP is not significantly affected by the local solution in all datasets, indicating that BP is easily trapped in the local minima. Compared with the multipoint search method, BP has the disadvantages of insufficient problem orientation and easy deviation from the local solution. Therefore, different from other algorithms, we believe that BBO has the characteristics of obtaining smaller MSE under fewer iterations, which emphasizes the unique advantages of DNM over the state-of-the-art methods.

Moreover, Figure 4 depicts the average solution distribution of 30 independent runs of each learning algorithm for four datasets, respectively. BBO has the best stability that often finds stable solutions, whereas GA, DE, jSO, or BP usually varies widely across different runs. In addition, it can be easily found that the minimum MSE for each dataset is BBO, and the maximum MSE for each dataset is BP.

According to the supplementary material (available here), the average accuracy, standard deviation, average MSE, and corresponding optimal parameter of each learning algorithm in different datasets are summarized in Table 8.

Table 9 shows the overall statistical results of eight learning algorithms for four problems via the Friedman test at the level of *α* = 0.05 with the application of Bonferroni-Dunn procedures. Friedman test indicates whether there is a difference in the average rank between various ordinal variables. Post hoc test is Bonferroni-Dunn procedure, whose adjusted value is *p*_{Bonf}. The result shows that BBO has excellent robustness in each problem and performs better than other methods in terms of average accuracy, standard deviation, and average MSE. This proves that BBO has higher stability and is not easily affected by the problem. Furthermore, CSO is the algorithm with the best results except BBO.

On the contrary, results of BP are the worst, which vary greatly with different datasets, indicating that its accuracy, reliability, and stability are not good and are easily affected by problems. Consequently, the accuracy of any learning algorithm will be biased according to the compatibility of the problem and the combination of parameters, and in terms of convergence and stability of MSE, especially for convergence speed, BBO is reliable and has obvious advantages.

#### 5. Discussion

The experimental results show that DNM + BBO has significant advantages over the other seven optimization algorithms (BP, PSO, GA, PBIL, CSO, DE, and jSO) and other machine learning algorithms (decision tree, SVM, KNN, and MLP). This is due to the mechanism of BBO algorithm, through interspecies migration and intraspecies mutation, the feature information of different habitats has changed, and the dominant features of habitats have been shared; it avoids a large number of local solutions in DNM training. Although DNM + BBO shows great potential in classification problems, there are many challenges in practical applications, such as redundancy in data and more irrelevant information, which reduce the performance and increase the computational complexity of machine learning classification algorithms. For high-dimensional heterogeneous data, we will continue to solve the problems of DNM in future research.

#### 6. Conclusion

With the advent of the era of big data, researches on high-precision models with simple structure and low cost are developing rapidly in solving complicated problems. In this paper, considering the synaptic nonlinearity, a new learning algorithm based on the DNM model is put forward to solve complex classification optimization problems. Adopting factor allocation and orthogonal array, eight learning algorithms (i.e., BP, BBO, PSO, GA, PBIL, CSO, DE, and jSO) are used to train the DNM systematically for four datasets. And the effectiveness, stability, classification accuracy, and convergence speed of these algorithms are compared and demonstrated for such problems.

The experimental results show that BP cannot find the global optimal weight and threshold since its inherent local minimum trap problem, and its effect and accuracy are extremely limited. And the performance of BBO is the most competitive. No matter what kind of dataset is used, the stability, accuracy, and convergence speed of BBO are the most excellent and obviously superior to those of other algorithms. Moreover, the comprehensive performance of CSO is the best except for BBO. The performance of PSO is second only to that of CSO. For GA, PBIL, DE, and state-of-the-art jSO, although they are slightly better than BP, the results vary greatly, while the datasets are different. In addition, compared with common machine learning methods such as decision tree, support vector machine, *k*-nearest neighbor, and artificial neural networks, dendritic neuron model trained by biogeography-based optimization has significant advantages.

Therefore, this paper combines metaheuristic algorithm and dendrite neuron model effectively, and DNM + BBO is established in this study, especially as a powerful algorithm to solve classification and optimization problems. In the future, we will further explore and try to expand the output range of DNM, apply it to a wider range of fields to solve other different problems, and continue to pursue improving its performance [64].

#### Data Availability

The datasets of this paper are obtained from the following website: https://archive.ics.uci.edu/ml/index.php.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Acknowledgments

This research was supported by the National Natural Science Foundation of China (NSFC) under Grant nos. 12105120, 21805106, 62102169 and 72174079, the Natural Science Foundation of Jiangsu Province under Grant BK20181073, the MOE Key Laboratory of TianQin Project, Sun Yat-sen University, and the Open Fund Project of Jiangsu Institute of Marine Resources Development under Grant nos. JSIMR202018.

#### Supplementary Materials

Experimental results of eight learning algorithms (i.e., BP, BBO, PSO, GA, PBIL, CSO, DE, and jSO) over 30 independent runs for four datasets. Tables S1–S4 show the average accuracy of each learning algorithm for the full parameter combination in each dataset. Furthermore, Tables S5–S8 depict the parameter sensitivity results based on factor assignment and orthogonal array of each learning algorithm for each dataset.* (Supplementary Materials)*