Abstract
In this paper, an evolutionary dendritic neuron model (EDNM) is proposed to solve classification problems. It utilizes synapses and dendritic branches to implement the nonlinear computation. Distinct from the classical dendritic neuron model (CDNM) trained by the backpropagation (BP) algorithm, the proposed EDNM is trained by a metaheuristic cuckoo search (CS) algorithm instead, which has been regarded as a global searching algorithm. CS algorithm enables EDNM to avoid several disadvantages, such as slow convergence, trapping into local minimum, and being sensitive to initial values. To evaluate the performance of EDNM, we compare it with a multilayer perceptron (MLP) and CDNM on two benchmark classification problems. The experimental results demonstrate that EDNM is superior to MLP and CDNM in terms of accuracy rate, receiver operator characteristic curve (ROC), and convergence speed. In addition, the neural structure of EDNM can be replaced by a logical circuit completely, which can be implemented in hardware easily. The corresponding experimental results also verify the effectiveness of the logical circuit classifier.
1. Introduction
Classification is machine learning techniques that allocate objects in a collective form to find the classes. Many problems of science, business, and medicine can be treated as classification problems, for example, medical diagnosis, quality control, bankruptcy prediction, credit scoring, handwritten character recognition, and speech recognition [1]. Various machine learning techniques have been proposed to solve classification problems, namely, nearest neighbor [2], decision tree [3], artificial neural networks (ANNs) [4], rulebased classifier [5], naive Bayes classifiers [6], linear discriminant analysis [7], and support vector machine [8].
Among them, ANNs are considered as one of the comprehensive classifiers [9], which are computational models inspired by the biological nervous system, to mimic the information processing way of neurons in the human brain [10]. The first mathematical model of ANNs which apply nonlinear threshold unit was proposed by McCulloch and Pitts in 1943 [11]. In McCulloch and Pitts’s model, the neuron receives input signals from other neurons and assigns the weight that represents the connection strength between nerve cells to each input. It determined whether it is activated or remains inactive based on the results in the model [12]. The research reveals that it is difficult for the model to solve the nonlinearly separated problems because of its oversimplified structure [13].
Although the learning ability of MLP which utilizes McCulloch and Pitts’s model as a fundamental calculative unit makes it a powerful tool for various applications [14], further biological studies have inferred that a single neuron could own powerful computation capacity by taking into account the synaptic nonlinearities of a dendritic tree, which is significantly different from McCulloch and Pitts’s model [15, 16]. Similar issues also occur in the spiking neural networks (SNNs), in which integrateandfire neurons communicate information via discrete spike events based on spiketimingdependent plasticity (STDP) rule [17]. It is proven that although SNNs are biologically more plausible, they ignore the information computation of dendritic structures and bear little resemblance with the biological neural model [18].
Different neurons own distinct dendritic structures in vivo; even a small variation in the dendritic morphology will produce a great change in neuron functions [19, 20]. By analyzing the interaction of the excitatory and inhibitory synaptic inputs in neural cells, Koch et al. proposed a like neural model with a dendritic structure [21, 22]. While the dendritic structure of this model remains unchanged concerning all the given tasks, it cannot realize the plasticity property of the dendritic morphology. To be specific, the like neural model lacks an effective mechanism to determine whether the synapse is excitatory or inhibitory, as well as which branches of dendritic trees are redundant and need to be eliminated, which means that Koch’s model is unrealistic when compared with biological neuron models [23].
Recently, Legenstein and Maass proposed a single neuron model with dynamic dendritic structure based on STDP and branchstrength potentiation (BSP) [24]. The neuron model is used to solve a simple featurebinding problem in a selforganized manner. However, it has been proven that the neuron model is not capable of solving the nonlinear separable problem, such as the EXOR benchmark problem [25]. In addition, Ritter et al. proposed a mathematical model named lattice neural networks (LNNs) in a morphological neuron based on lattice algebra [26]. LNNs make use of lattice operators and for the construction of the computational algorithms and replace the multiplication algebra operator of the real numbers by the addition operator. They are closely related to Lattice Computing which is regarded as the collection of computational intelligence tools [27]. Although LNNs incorporate the dendrite computation in the neural model, they do not further interpret and realize the implement of the plasticity mechanisms in the dendritic structure.
In our previous work [28, 29], we proposed a single neuron model with nonlinear interaction among synapses in the dendrites, named CDNM. Experimental studies prove that CDNM can effectively settle practical tasks including cancer diagnoses [30, 31], financial time series forecasting [32], and credit risk assessment [33]. Besides, an unsupervised learnable neuron model based on CDNM was proposed, and it has been proved to be able to solve the twodimensional multidirectional selectivity problem [34]. Moreover, studies prove that plasticity mechanisms of CDNM can be implemented via a neuronpruning mechanism, which consists of synaptic pruning and dendritic pruning. Neuron pruning occurs along with the training process of the model. The pruned neural model can be replaced with the logical circuits that merely contain the comparators, logical AND, OR, and NOT [35, 36].
Although CDNM has been used in various applications effectively, the original BP algorithm largely limits the computation capability of CDNM. BP is the gradientbased training algorithm; it requires that the neuron transfer function must be differentiable. The gradient information is highly sensitive to the initial conditions, which makes BP suffer from trapping into local minima easily [37]. In addition, BP and its variations have several drawbacks, such as slow convergence speed and overfitting [38]. Therefore, to avoid these disadvantages caused by BP algorithm, the proposed EDNM employs a natureinspired CS algorithm [39] as the learning algorithm in this paper, which is acknowledged as a global search algorithm. The CS algorithm combines a global random walk with a local random walk, which mimics brood parasitic behavior of cuckoo species and the Lévy flight [40] behavior of some birds and fruit flies. The powerful optimization ability enables CS to become an effective training algorithm, and EDNM can avoid trapping into local minima due to the update of the solution being independent of explicit gradient information. The performances of EDNM are evaluated and compared on two benchmark classification datasets in our experiments. In addition, we also verify the effectiveness of neural pruning and logical circuit replacement.
The rest of this paper is organized as follows: Section 2 describes the details of the proposed model. Section 3 introduces CS algorithm. Simulations related to the descriptions of two benchmark datasets, evaluation metrics, experimental setup, and performance comparison are provided and discussed in Section 4. Finally, concluding remarks are presented in Section 5.
2. Model Description
The proposed EDNM mimics the mechanism of signal interactions in the biological neural model. The signal processing of EDNM is shown as follows: First, the synaptic layer receives the input signals and processes them through one of defined connection cases. Then, the results of the synapses are transferred to the dendritic branches. The membrane layer sums the dendritic activation and transfers the results to the cell body. The structural morphology of EDNM has been presented in Figure 1.
2.1. Synaptic Layer
In the synaptic layer, each synapse connects one feature attribute to receive the input signals of training samples. A sigmoid function is adopted to describe the process; it can be expressed bywhere represents the input signal and represents the output of the synapse on dendritic branches. is a userdefined parameter and remains constant in the calculation process. The parameters and are initialized randomly in the range [−2, 2]; then they are trained by the learning algorithms. Based on the values of and , the threshold of the synaptic layer can be calculated as follows:
In addition, according to different values of and , the connection cases of the synaptic layer can be divided into four types, namely, the direct connection (•), the inverse connection (_), the constant0 connection (⓪), and the constant1 connection (①). The graphic symbols of the synapses in the four connection cases are provided in Figure 2.(i)Type 1: Direct Connection. Case (a): ; for example, and . As shown in Figure 3(a), direct connection means if the input is greater than the threshold, the synapse will output “1.” Otherwise, it will output “0.”(ii)Type 2: Inverse Connection. Case (b): ; e.g., and . The sigmoid function of inverse connection is illustrated in Figure 3(b). Contrary to direct connection, if the input is greater than the threshold, the synapse will output “0.” Otherwise, it will output “1.”(iii)Type 3: Constant1 Connection. Case (): ; e.g., and . Case (): ; for example, and . In Figures 3(c) and 3(d), it can be observed that, no matter what the input value is, the output of the synapse will remain “1.” Constant1 connection plays a key role in the synaptic pruning.(iv)Type 4: Constant0 Connection. Case (): ; e.g., and . Case (): ; for example, and . Similarly, constant0 connection implies that the output of the synapse remains “0” regardless of the input. It also contributes to the dendritic pruning, which will be introduced in the next section.
(a)
(b)
(c)
(d)
2.2. Dendritic Layer
Dendritic structure plays an important role in neural computation. Different neurons own distinct dendritic structure; even a small variation in the dendritic morphology arouses a great change in the neural function. Thus, to realize the plasticity of the dendritic morphology, the simplest nonlinear operation named “multiplication” is adopted in DENM. Combined with four connection cases of the synaptic layer, it can implement neural pruning function to build a unit dendritic structure for each specific problem. The mathematical formula can be expressed as
2.3. Membrane Layer
The membrane layer receives the signals from each branch of dendrites and completes a sublinear summation operation. Then, it transfers the results to the cell body. Its equation is defined as follows:
2.4. Cell Body (Soma)
The output signal from the membrane layer is processed by a nonlinear sigmoid function in the cell body. It is the core part of the computation of the single neural model. The signal will be compared with the threshold of the soma; if it is larger, the neuron will fire; otherwise, it will not. The function of the cell body is expressed as follows:where denotes the positive constant parameters of the cell body. represents the threshold of the cell body and its range is [0, 1].
2.5. NeuronPruning Function
EDNM adopts the neuronpruning function to realize the plasticity of the dendritic structure. Specifically, neuronpruning function prunes unnecessary synapses and dendritic branches during the training process. Then it builds a unit structural morphology of EDNM for each specific problem. In EDNM, the pruning mechanism contains two parts, namely, synaptic pruning and dendritic pruning.
2.5.1. Synaptic Pruning
As introduced above, if one synaptic layer is in the constant1 connection case, its output is fixed to 1 no matter what its input is. The fundamental math operation of the dendritic layer is multiplication; it is known that any value multiplied by 1 is equal to itself. It implies that the output of this synaptic layer has no influence on the result of its local dendritic branch. Thus, we can ignore the synapse and the feature attribute it connects to, and this kind of synaptic layers needs to be discarded in EDNM.
2.5.2. Dendritic Pruning
Similarly, if a synaptic layer is a constant0 connection, whatever the input is, its output will remain 0. Because of the multiplication operation and the rule that any value multiplied by 0 is equal to 0, the output of the whole dendritic branch is fixed to 0. The branch makes no contribution to the output of the soma body. Therefore, we should eliminate this kind of dendritic layers which include all the synaptic layers on them and the connecting feature attributes.
In order to further demonstrate the mechanism of neuron pruning, an example of the pruning process in EDNM is illustrated in Figure 4. It can be observed that, before pruning, the neural structure owns two dendritic layers and each dendritic branch has four synaptic layers in Figure 4(a). Since the synapse that connects to the input on Dendrite2 is in the constant0 connection case, according to the mechanism of dendritic pruning, Dendrite2 and all the synaptic layers on it need to be pruned simultaneously. Thus, the pruned parts of the neural structure are drawn in dotted lines as illustrated in Figure 4(b). Besides, because the synaptic layer that connects to the input on Dendrite1 is in the constant1 connection case, on the basis of the synaptic pruning, this synapse should be detected. Finally, the simplified neural structure is presented in Figure 4(c).
(a)
(b)
(c)
2.6. Logical Circuit
Through the synaptic pruning and dendritic pruning, only the direct connections and inverse connections are retained and a unique simplified neural structure is formed according to the problem. Furthermore, the simplified structure can be transformed into a logical circuit by the comparators, logical AND, OR, and NOT gates. As shown in Figure 5, in the synaptic layer, the direct connection can be implemented by the comparator and a combination of the comparator and logical NOT gate can be used to replace the inverse connection. For the dendritic layer, multiple synaptic layers on a branch can be connected by the logical AND gate. All the dendritic layers are aggregated to the membrane layer, which can be equivalent to the logical OR gate. In the cell body, a simple nonlinear mapping operation is implemented and it can be replaced by a single wire. A unique logical circuit can be obtained through these processes, and since there is no floatpoint calculation in the logical circuit, the classification speed can be extremely improved without sacrificing the accuracy. In the era of big data, logical circuit classifier might be a talented technology owing to its simplicity.
3. CS Algorithm
CS algorithm is inspired by a special lifestyle and aggressive reproduction strategy of cuckoo species. Cuckoo never hatched eggs by themselves and put their eggs in the nest of other bird species. Let other bird species help them to hatch eggs. Some cuckoo species (e.g., ani and guira) not only put their eggs in the communal host nest but also throw hosts’ eggs away to upgrade the hatching probability of their own eggs [41]. Sometimes, the hosts have the possibility to find the alien eggs and take a counterattack through throwing these alien eggs away or abandoning the nest and building a new nest. Studies have found that, in addition to simple parasitic behavior, a cuckoo called Tapera mimics the color and pattern of the eggs of the selected host [41]. This behavior is more conducive to increasing the number of the eggs that are successfully hatched.
CS algorithm is first proposed by Yang and Deb in 2009 [39]; for the simplicity in describing the CS, the following three basic rules are utilized:(i)Each cuckoo lays an egg at one time and places it in a randomly selected nest.(ii)The nests with the highest quality of the eggs are carried over to the next generations.(iii)The number of the available host nests is constant, and the egg is discovered by the host bird with a probability of . The latter assumption can be approximated by the fraction where the nests are replaced by new ones (new random solutions).
With these three rules, the basic steps of the CS are summarized as the pseudocode shown in Algorithm 1. In the CS algorithm, a global random walk combined with a local random walk is adopted. First, the equation of the local random walk can be expressed as follows:where and are two distinct random solutions in the current population, represents the step size, and denotes its scaling factor. represents a unit step function. is a switching parameter that controls the balance between a local random walk and a global random walk, is a random value from a uniform distribution. The symbol represents the entrywise multiplications operation. Then, the global random walk that applies Lévy flights can be described as follows:where denotes the scaling factor of step size and the function can be calculated bywhere is the Lévy exponent, function will be a constant for a given , and represents the step size scaling factor. It is widely regarded that the Lévy fights can maximize the efficiency of the resource searches, and it has been observed from the foraging patterns of albatrosses, fruit flies, and spider monkeys [42–44]. In addition, empirical evidence has verified that CS is superior to PSO and genetic algorithms [39]. Therefore, the CS algorithm is employed as the training algorithm in our experiments (Algorithm 1).

4. Simulation
4.1. Dataset Description
In this section, to compare the classification performances of the EDNM, CDNM, and MLP, we conduct the simulations on two benchmark datasets, namely, the Glass Identification Dataset (GID) and Congressional Voting Records Dataset (CVRD), which are chosen from the UCI machine learning repository. The details of these datasets are shown in Table 1.
4.1.1. GID
GID is obtained by measuring the chemical constitution of glass, fabricated by two different processes [45]. The dataset contains 163 samples of window glass and 51 samples of nonwindow glass. Each record has 9 attributes, which include its refractive index and the proportion of its eight chemical components (Na, Mg, Al, Si, K, Ca, Ba, and Fe). These analytical results are recorded as 9 numerical continuous values.
4.1.2. CVRD
CVRD records the voting results of the 98th Congress. It contains 435 samples that record the data of votes for each of the U.S. House of Representatives Congressmen on the 16 key votes (attributes) identified by the CQA. Its classification task is to find the correct political party affiliation of each congressman [46]. Since some attributes of CVRD include missing attribute values, the attribute with missing values needs to be deleted. Finally, 232 complete samples are left, which include 124 “Democrat” samples and 108 “Republican” samples. CVRD is recorded as categorical attributes; thus, 16 categorical attributes, “Yes” and “No,” are converted to numerical “1” and “0”; two categorical classes “Republican” and “Democrat” are changed to numerical “1” and “0,” respectively [47].
4.2. Evaluation Metrics
In our experiments, to measure the performance of each model, we adopt four performance evaluation criteria, namely, accuracy rate, receiver operator characteristic curve (ROC), convergence speed, and nonparametric statistical test.(a)Accuracy Rate. The most important evaluation metric is the accuracy rate, which can be expressed as follows:where , , , and indicate true positive, true negative, false positive, and false negative, respectively. To understand the equation better, the confusion matrix constructed by , , , and is shown in Table 2. Although accuracy is the simplest, most intuitive, and commonly used performance comparison method, it is not enough for a complete performance evaluation [48].
where and indicate the actual output and the predicted output, respectively. is the sample number of the training dataset.
(d)Nonparametric Statistical Test. Since the nonparametric statistical test is not limited by the overall distribution and its assumption is relatively fewer, it is more robust and has wider applicability than a parametric statistical test. The nonparametric analysis test based on the assumption of the normal distribution is more sensitive and reliable than ttest which ignores the absolute magnitudes of the differences [53]. In our experiments, Wilcoxon’s ranksum test [54] is adopted to complete the nonparametric statistical test.4.3. Simulation Setup
In our experiment, each dataset is split into a training set and a testing set. Each set contains 50% of the samples [55], as shown in Table 3. Before training, the dataset will be normalized to the range of [0, 1]. The normalized rule follows the maximum and minimum normalization method, which can be expressed as follows:
In order to maintain the fairness of the comparison, the number of the parameters in each model should be set to the same or approximately equal as possible. The modal structure of MLP is different from CDNM and EDNM; the numbers of the weights and thresholds in MLP can be calculated as follows:where and represent the numbers of neurons in the input layer and hidden layer, respectively. In the neural structure of EDNM and CDNM, since each synapse owns two parameters and , when the number of dendritic layers is determined, the total parameter number in EDNM and CDNM structure is expressed by the following equation:
In our experiment, when a benchmark dataset is chosen, the value of will be determined. Based on equations (13) and (14), setting suitable values of and will make and approximately equal to each other. Table 4 summarizes the model structures of MLP, CDNM, and EDNM on the two benchmark datasets. It is easy to observe that all the three methods have nearly the same parameter numbers for both datasets. In addition, the transfer functions of MLP in the hidden layer and output layer are both set to “Logsigmoid.” The learning rate of CDNM and MLP is 0.01. The population size of EDNM is set to 50. The iteration times of three methods are set to be 1000; each method runs 30 times in our experiments independently.
4.4. Optimal Parameter Setting
In EDNM, there are three parameters, namely, , , and , which need to be defined by users. is a constant value in the sigmoid function of the synaptic layer, denotes the threshold of the soma, and represents the number of the dendritic branches. In order to find an optimal combination of these parameters, the Taguchi method is adopted in our experiment, which can reduce the number of experimental running times and ensure the dependability of the experiment [56, 57]. According to the Taguchi method, only 16 experiments out of 64 are run; and 16 experiments are enough to find the optimal parameter setting quickly and efficiently. Table 5 shows four levels of interest for the two benchmark datasets. The orthogonal arrays are shown in Tables 6 and 7, respectively. It can be observed that, in Table 6, the parameter setting (, , and ) holds the highest testing accuracy; in Table 7, the highest testing accuracy is obtained by the parameter setting (, , and ). Through the above experiments, the optimal parameter settings of the two benchmark datasets can be determined.
4.5. Performance Comparison
In order to verify the classification performance of EDNM, we compare it with MLP and the original CDNM on two benchmark datasets. Table 8 presents the experimental results. It is easy to observe that EDNM obtains higher accuracy than MLP and CDNM on both problems. To detect the significant differences between EDNM and the other models, Wilcoxon’s ranksum test is utilized in our experiment. Its significance level is set to 0.05. If the value is less than 0.05, the null hypothesis that there are no significant differences between the two comparison objects can be rejected. The statistical results are shown in Table 8. From Table 8, it is implied that EDNM performs significantly better than both MLP and CDNM on the two benchmark problems.
In addition, for the comprehensive evaluation of model performance, the convergence curves of three models on two benchmark problems are illustrated in Figure 6. As shown in Figure 6, EDNM has a higher convergence speed than MLP and CDNM, obviously. In Table 8, the statistical results demonstrate that the AUC value of EDNM is significantly larger than those of the other models on both problems. The corresponding ROC curves are compared and presented in Figure 7.
(a)
(b)
(a)
(b)
Based on the above experimental results, it can be concluded that EDNM is capable of providing more powerful classification performances to solve GID and CVRD problems compared to MLP and CDNM. Higher convergence speed indicates that EDNM is a more efficient classifier, which will save computation time in the practical applications.
4.6. NeuronPruning Analysis
The neuronpruning function of EDNM has been introduced in Section 3. During the training process, it prunes the unnecessary synapses and dendritic branches and then produces a specific dendritic structure for each problem. The high plasticity of EDNM’s structure brings about the following benefits: First, neuron pruning can realize feature selection for EDNM; only the useful feature attributes which can contribute to the final results are retained in the structure. Second, it simplifies the neural structure, which will reduce computational cost and increase computational speeding (Figure 8).
(a)
(b)
(c)
(d)
Firstly, we present the evolution of the structural morphology of the GID problem in Figure 3. It can be observed that there are 9 feature attributes and 12 dendritic branches in the structure before learning; the connection cases of all the 108 synapses are randomly set in Figure 3(a). After training by CS algorithm, the structural morphology of EDNM is presented in Figure 3(b). According to the rules of dendritic pruning, 10 branches of dendrites are deleted and only “” and “” are left. Figure 3(c) illustrates the simplified structure of EDNM after dendritic pruning. Then, based on the rule of synaptic pruning, 14 unnecessary synaptic layers are ruled out and only 4 synapses are retained. Finally, the mature structural morphology of EDNM on the GID problem is provided in Figure 3(d). Similarly, Figure 9 illustrates the evolution of the structural morphology of the CVRD problem. The pruning results of the two benchmark problems are summarized in Table 9. It is easy to conclude that the pruned neural structures are much more simplified than the original ones; the neuronpruning function can significantly simplify the structural morphology of EDNM.
(a)
(b)
(c)
(d)
4.7. Logical Circuit Analysis
As mentioned above, the simplified structures of EDNM can be completely substituted by the logical circuits. In this section, we attempt to verify the effectiveness of the logical circuit classifiers. According to the final neural structures in Figures 3 and 9, the logical circuit classifiers of two benchmark problem are presented in Figure 10. As shown in Figure 10, the logical circuit classifiers consist of the comparators, logical AND, OR, and NOT gates, where the comparators are used for comparison with the input signals with their thresholds . If the inputs exceed the thresholds, the outputs are 1 and 0 otherwise. It is noteworthy that since the final neural structure of CVRD only has one synaptic layer left, there are no logical AND, OR, and NOT gates in the corresponding logical circuit classifier, except a comparator.
(a)
(b)
Besides, we compare the classification performances of the logical circuit classifiers and the normal EDNM in Table 10. As illustrated in the table, the logical circuits do not sacrifice the accuracies on both benchmark problems. In addition, once the logical circuit classifiers are implemented on hardware, the classification speed will be much higher than that in all the other classifiers in the literature. According to the above characteristics, the logical circuit classifier is considered as a satisfactory and efficient classifier for realworld classification tasks.
5. Conclusion
In this study, an EDNM is proposed to solve the classification problems. It consists of four layers, namely, the synaptic layer, the dendritic layer, the membrane layer, and the soma. The unique structure makes EDNM implement the neural pruning mechanism, which can rule out the unnecessary synapse and dendritic branches. Compared with the original BP algorithm of CDNM, CS algorithm has higher convergence speed and great classification accuracy on two benchmark problems, where the statistical results demonstrate that EDNM performs significantly better than MLP and CDNM. Besides, we also present the logical circuit classifiers produced by EDNM and verify their accuracy rate. The experimental results show that the logical circuits maintain satisfying classification performances. It is noted that, to the best of our knowledge, when the logical circuit classifiers run on hardware, the classification speed will be higher than that in all the other classifiers in the literation. In our future research, we will attempt to adopt the multiobjective optimization algorithms to train the structure and weights of EDNM, simultaneously, which may be able to produce a more simplified and highaccuracy logical circuit for each classification problem.
Data Availability
The benchmark classification datasets could be downloaded freely at https://archive.ics.uci.edu/ml/index.php.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the Guangdong Basic and Applied Basic Research Fund Project (No. 2019A1515111139) and JSPS KAKENHI (Grant no. 19K12136).