Research Article | Open Access
Beatriz A. Garro, Roberto A. Vázquez, "Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms", Computational Intelligence and Neuroscience, vol. 2015, Article ID 369298, 20 pages, 2015. https://doi.org/10.1155/2015/369298
Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms
Artificial Neural Network (ANN) design is a complex task because its performance depends on the architecture, the selected transfer function, and the learning algorithm used to train the set of synaptic weights. In this paper we present a methodology that automatically designs an ANN using particle swarm optimization algorithms such as Basic Particle Swarm Optimization (PSO), Second Generation of Particle Swarm Optimization (SGPSO), and a New Model of PSO called NMPSO. The aim of these algorithms is to evolve, at the same time, the three principal components of an ANN: the set of synaptic weights, the connections or architecture, and the transfer functions for each neuron. Eight different fitness functions were proposed to evaluate the fitness of each solution and find the best design. These functions are based on the mean square error (MSE) and the classification error (CER) and implement a strategy to avoid overtraining and to reduce the number of connections in the ANN. In addition, the ANN designed with the proposed methodology is compared with those designed manually using the well-known Back-Propagation and Levenberg-Marquardt Learning Algorithms. Finally, the accuracy of the method is tested with different nonlinear pattern classification problems.
Artificial Neural Networks (ANNs) are system composed of neurons organized in input, output, and hidden layers. The neurons are connected to each other by a set of synaptic weights. An ANN is a powerful tool that has been applied in a broad range of problems such as pattern recognition, forecasting, and regression. During the learning process, the ANN continuously changes their synaptic values until the acquired knowledge is sufficient (until a specific number of iterations is reached or until a goal error value is achieved). When the learning process or the training stage has finished, it is mandatory to evaluate the generalization capabilities of the ANN using samples of the problem, different to those used during the training stage. Finally, it is expected that the ANN can classify with an acceptable accuracy the patterns from a particular problem during the training and testing stage.
Several classic algorithms to train an ANN have been proposed and developed in the last years. However, many of them can stay trapped in nondesirable solutions; that is, they will be far from the optimum or the best solution. Moreover, most of these algorithms cannot explore multimodal and noncontinuous surfaces. Therefore, other kinds of techniques, such as bioinspired algorithms (BIAs), are necessary for training an ANN.
BIAs have a good acceptance by the Artificial Intelligence community because they are powerful optimization tools and can solve very complex optimization problems. For a given problem, BIAs can explore big multimodal and noncontinuous search spaces and can find the best solution, near the optimum value. BIAs are based on nature’s behavior described as swarm intelligence. This concept is defined in  as a property of systems composed of unintelligent agents with limited individual capabilities but with an intelligent collective behavior.
There are several works that use evolutionary and bioinspired algorithms to train ANN as another fundamental form of learning . Metaheuristic methods for training neural networks are based on local search, population methods, and others such as cooperative coevolutionary models .
An excellent work where the authors show an extensive literature review of evolutionary algorithms that are used to evolve ANN is . However, most of the reported researches are focused only on the evolution of the synaptic weights, parameters , or involve the evolution of the neuron’s numbers for hidden layers, but the number of hidden layers is established previously by the designer. Moreover, the researches do not involve the evolution of transfer functions, which are an important element of an ANN that determines the output of each neuron.
For example, in , the authors proposed a method that combines Ant Colony Optimization (ACO) to find a particular architecture (the connections) for an ANN and Particle Swarm Optimization (PSO) to adjust the synaptic weights. Other researches like  implemented a modification of PSO mixed with Simulated Annealing (SA) to obtain a set of synaptic weights and ANN thresholds. In , the authors use Evolutionary Programming to get the architecture and the set of weights with the aim to solve classification and prediction problems. Another example is  where Genetic Programming is used to obtain graphs that represent different topologies. In , the Differential Evolution (DE) algorithm was applied to design an ANN to solve a weather forecasting problem. In , the authors use a PSO algorithm to adjust the synaptic weights to model the daily rainfall-runoff relationship in Malaysia. In , the authors compare the back-propagation method versus basic PSO to adjust only the synaptic weights of an ANN for solving classification problems. In , the set of weights are evolved using the Differential Evolution and basic PSO.
In other works like , the three principle elements of an ANN are evolved at the same time: architecture, transfer functions, and synaptic weights. The authors proposed a New Model of a PSO (NMPSO) algorithm, while, in , the authors solve the same problem by means of a Differential Evolution (DE) algorithm. Another example is , where the authors used an Artificial Bee Colony (ABC) algorithm to evolve the design of an ANN with two different fitness functions.
This research has significant contributions in comparison with these last three works. First of all, eight fitness functions are proposed to deal with three common problems that emerge during the design of the ANN: accuracy, overfitting, and reduction of the ANN. In that sense, to handle better the problems that emerge during the design of the ANN, the fitness functions take into account the classification error, mean square error, validation error, reduction of architectures, and a combination of them. Furthermore, this research explores the behavior of three bioinspired algorithms using different values for their parameters. During the experimentation phase, the best parameter’s values for these algorithms are determined to obtain the best results. In addition, the best configuration is used to generated a set of statistically valid experiments for each selected classification problem. Moreover, the results obtained with the proposed methodology in terms of the connection’s number, the neuron’s number, and the transfer functions selected for each ANN are presented and discussed. Another contribution of this research is related to a new metric that allows comparing efficiently the results provided by an ANN generated with the proposed methodology. This metric takes into account the recognition rate obtained during training and testing stages where testing accuracy is more weighted in comparison to training accuracy. Finally, the results achieved by the three bioinspired algorithms are compared against those achieved with two classic learning algorithms. The selection of the three bioinspired algorithms was done because NMPSO is a relatively new algorithm (proposed in 2009) which is based on the metaphor of basic PSO technique so it is important to compare its performance with others inspired in the same phenomenon.
In general, it is possible to define the problem to be solved as giving a set of input patterns , , and a set of desired patterns , , and finding the ANN represented by such that a function defined by is minimized and defined the maximum number of neurons. It is important to remark that the search space involves three different domains (architecture, synaptic weight, and transfer functions).
This research provides a complete study about how an ANN can be automatically designed by applying bioinspired algorithms, particularly using the Basic Particle Swarm Optimization (PSO), Second Generation PSO (SGPSO), and New Model of PSO (NMPSO). The proposed methodology evolves at the same time the architecture, the synaptic weights, and the kind of transfer functions in order to design the ANNs that provide the best accuracy for a particular problem. Moreover, a comparison of the Particle Swarm algorithm performance versus classic learning methods (back-propagation and Levenberg-Marquardt) is presented. In addition, in this research is presented a new way to select the maximum number of neurons (MNN). The accuracy of the proposed methodology is tested solving some real and synthetic pattern recognition problems. In this paper, we show the results obtained with ten classification problems of different complexities.
The basic concepts concerning the three PSO algorithms and ANN are presented in Sections 2 and 3, respectively. In Section 4 the methodology and the strategy used to design the ANN automatically are described. In Section 5 the eight fitness functions used in this research are described. In Section 6, the experimental results about tuning the parameters for PSO algorithms are described. Moreover, the experimental results are outlined in Section 7. Finally, in Sections 8 and 9 the general discussion and conclusions of this research are given.
2. Particle Swarm Optimization Algorithms
In this section, three different algorithms based on PSO metaphor are described. The first one is the original PSO algorithm. Then, two algorithms which improve the original PSO are shown: the Second Generation of PSO and a New Model of PSO.
2.1. Original Particle Swarm Optimization Algorithm
The Particle Swarm Optimization (PSO) algorithm is a method for the optimization of continuous nonlinear functions proposed by Eberhart et al. . This algorithm is inspired by observations of social and collective behavior on the movements of bird flocks in search of food or survival as well as fish schooling. A PSO algorithm is inspired on the movements of the best member of the population and at the same time also on their own experience. The metaphor indicates that a set of solutions is moving in a search space with the aim to achieve the best position or solution.
The population is considered as a cumulus of particles where each represents a position , in a multidimensional space. These particles are evaluated in a particular optimization function to recognize their fitness value and save the best solution. All the particles change their position in the search space according to a velocity function which takes into account the best position of a particle in a population (i.e., social component) as well as their own best position (i.e., cognitive component). The particles will move in each iteration to a different position until they reach an optimum position. At each time , the particle velocity is updated usingwhere is the inertia weight and typically set up to vary linearly from to during the course of an iteration run; and are acceleration coefficients; and are uniformly distributed random numbers between . The velocity is limited to the range . Updating velocity in this way enables the particle to search for its best individual position , and the best global particle position is computed as in
2.2. Second Generation of PSO Algorithm
The SGPSO algorithm  is an improvement of the original PSO algorithm that considers three aspects: the local optimum solution of each particle, the global best solution, and a new concept, the geometric center of optimum swarm. The authors explain that the birds keep a certain distance from the swarm center (food). On the other hand, no bird accurately calculates the position of the swarm center every time. Bird flocking always stays in the same area for a specified time, during which the swarm center will be kept fixed in every bird eyes. Afterward, the swarm moves to a new area. Then all birds must keep a certain distance in the new swarm center. This fact is the basis of the SGPSO.
The position of the geometric centre of the optimum swarm is updated according towhere is the number of particles in the swarm, CI is the current iteration number, and is the geometric centre updating time of optimum swarm with a value between .
In SGPSO the velocity is updated by (4) and the position of each particle by (5):where , , and are constants called acceleration coefficients, , , and are random numbers in the range , and is the velocity inertia.
2.3. New Model of Particle Swarm Optimization
Shi and Eberhart  proposed a linearly varying inertia weight over the course of generations, which significantly improves the performance of Basic PSO. The following equation shows us how to compute the inertia:where and are the initial and final values of the inertia weight, respectively, iter is the current iteration number, and is the maximum number of allowable iterations. The empirical studies in  indicated that the optimal solution could be improved by varying the value of from 0.9 at the beginning of the evolutionary process to 0.4 at the end of the evolutionary process.
Yu et al.  developed a strategy that when the global best position is not improving with the increasing number of generations, each particle will be selected by a predefined probability from the population, and then a random perturbation is added to each velocity vector dimension of the selected particle . The velocity resetting is computed as inwhere is a uniformly distributed random number in the range and is the maximum random perturbation magnitude to each selected particle dimension.
Based on some evolutionary schemes of Genetic Algorithms (GA), several effective mutation and crossover operators have been proposed for PSO. Løvberg et al.  proposed a crossover operator in terms of a certain crossover rate defined inwhere is a uniformly distributed random number in the range , is the offspring, and and are the two parents randomly selected from the population.
The offspring velocity is calculated in the following equation as the sum of the two parents velocity vectors, normalized to the original length of each parent velocity vector:
Higashi and Iba  proposed a Gaussian mutation operator to improve the performance of PSO in terms of a certain mutation rate defined in where is the offspring, is the parent randomly selected from the population, is the current iteration number and is the maximum number of allowable iterations, and is a Gaussian distribution. Utilization of these operators in PSO has the potential to achieve faster convergence and find better solutions.
In the NMPSO, the use of dynamic random neighborhoods that change in terms of certain rates is proposed. First of all, a maximum number of neighborhoods is defined in terms of population size divided by 4. With this condition at least each neighborhood , , will have 4 members. Then, the members of each neighborhood are randomly selected, and the best particle is computed. Finally, the velocity of each particle is updated as infor all , .
The NMPSO combines the varying schemes of inertia weight and acceleration coefficients and , velocity resetting, crossover and mutation operators, and dynamic random neighbourhoods . The NMPSO algorithm is described in Algorithm 1.
3. Artificial Neural Networks
An ANN is a system that performs a mapping between input and output patterns that represent a problem . The ANNs learn information during the training process after several iterations. When the learning process finishes, the ANN is ready to classify new information, predict new behaviours, or estimate nonlinear function problems. Its structure consists of a set of neurons (represented by functions) connected among others organized in layers. The patterns that codify the real problem codification are sent through layers and the information is transformed with the corresponding synaptic weights (values between 0 and 1). Then, neurons in the following layers perform a summation of this information depending on whether there exists a connection between them. In addition, in this summation another input called bias is considered where the value of its input is 1. This bias is a threshold that represents the minimum level that a neuron needs for activating and is represented by . The summation function is presented in
After that, the result of the summation is evaluated in transfer functions activated by the neuron input. The result is the output neuron, and this information is sent to the other connected neurons until they reach the last layer. Finally, the output of the ANN is obtained.
The learning process consists of adapting the synaptic weights until they reach the desire behaviour. The output is evaluated to measure the performance of the ANN; if the output is not as desired, the synaptic weights have to be changed or adjusted in terms of the input patterns . There are two ways to verify if the ANN has learned: first, the ANN computes grades similarity between input patterns and information that it knew before (nonsupervised learning). Secondly, the ANN output with desire patterns is compared (supervised learning). In our case, supervised learning where the objective is to produce an output approximation with the desired patterns of a input-output samples set is applied (see the following equation):where is the input pattern and the desired response.
Given the training sample , the requirement is to design and compute the neural network free parameters so that the actual output of the neural network due to is close enough to for all in a statistical sense . We may use the mean square error (MSE) given by (14) as the first objective function to be minimized. There are algorithms that adjust the synaptic weights to obtain a minimum error such as the classic back-propagation (BP) algorithm [23, 24]. This algorithm like others is based on the descendant gradient technique, which can stay trapped in a local minimum. Furthermore, a BP algorithm cannot solve noncontinuous problems. For this reason, the applications of other techniques that can solve noncontinuous and nonlinear problems are necessary to implement for obtaining a better performance of the ANN and solving really complex problems:
4. Proposed Methodology
The most important elements to design and improve the accuracy of an ANN are the architecture (or topology), the set of transfer functions (TF), and the set of synaptic weights and bias. These elements should be codified into the individual that represents the solution of our problem. The solutions generated by the bioinspired algorithms will be measured by the fitness function with the aim to select the best individual which represents the best ANN. The three bioinspired algorithms (basic PSO, SGPSO, and NMPSO) are going to lead the evolutionary learning process until finding the best ANN by using one of the eight fitness functions proposed in this paper. It is important to remark that only pattern classification problems will be solved by the proposed methodology.
The methodology is evaluated with three particle swarm algorithms and eight fitness functions. Therefore, this involves an extensive behavioral study for each algorithm. Another point to review is the maximum number of neurons (MNN) used by the methodology to generate the ANN which is directly related to the dimension of the individual. Due to the information needed to determine the size of the individuals for a specific problem only depending on the input and output patterns (because the supervised learning is applied), it was necessary to propose an equation that allow us to obtain the MNN to design the ANN. This equation is explained in the individual section.
In Figure 1, a diagram of the proposed methodology is shown. During the training stage, it is necessary to define the individual and the fitness functions to evaluate each individual. The size of the individual depends on the size of the input patterns as well as the desire patterns. The individual will be evolved during a certain time to obtain the best solution (with a minimum error). At the end of the learning process, it is expected that the ANN provides an acceptable accuracy during the training and testing stage.
When solving an optimization problem, the problem has to be described as a feasible model. After the model is defined, the next step is focused on designing the individual that codifies the solution for the problem. Equation (15) shows an individual represented with a matrix that codifies the ANN design. This codification was previously described in [13–15]. As it is necessary to evolve the three ANN elements at the same time, a matrix is composed by three principal parts with the following information: first, the topology (), second the synaptic weights and bias (), and third the transfer functions (), where is the maximum number of neurons (MNN) defined by , is the input patterns vector dimension, and is the desired patterns vector dimension:
The matrix that represents the individual codifies three different types of information (topology, synaptic weights, and transfer function). In that sense, it is necessary to determine the exploring range of each type of information in its corresponding search space. For the case of the topology, the range is set between due to the integer number of this part being codified into a binary vector composed of elements that indicates if there is a connection between neuron and neuron .
The synaptic weights and bias have a range between and and for the transfer functions the range is , where is the total number of transfer functions.
4.2. Architecture and Synaptic Weights
Once the individuals or possibles solutions are obtained, it is necessary to decode the matrix information into an ANN for its evaluation. The first element to decode is the topology in terms of the synaptic weights and transfer functions that are stored in the matrix.
This research is limited to a kind of feed-forward ANN, for this reason some rules were proposed to guarantee that no recurrent connections will appear in the ANN (the unique restriction for the ANN). In future works, we will include recurrent connections and study the behavior of this type of ANNs.
The architectures generated by the proposed methodology will be composed of only three layers: input, hidden, and output. To generate valid architectures the following three rules must satisfied.
Let be the set of neurons composing the input layer, the set of neurons composing the hidden layer, and the set of neurons composing the output layer.(1)For the input layer neurons (ILN), the , , neuron only can send information to and .(2)For the hidden layer neurons (HLN), the , , neuron only can send information to and with one restriction for the last. For there is a connection only with .(3)For the output layer neuron (OLN), the , neuron only can send information to other neurons of their layer but with a restriction, for there is a connection only with .
To decode the architecture taking into account these rules, the information in with and (which is in decimal base) is codified based on the binary square matrix . This matrix will represent a graph where each component indicates the links between neuron and neuron when . For example, suppose that has an integer number “57.” It is necessary to transform it into a binary code “0111001.” The binary code is interpreted as the connections of a th neuron to seven neurons (number of bits). In this case, only neurons two, three, four, and seven (from left to right) links to neuron are observed.
Then, the architecture is now evaluated with the corresponding synaptic weights of the component with and . Finally, each neuron computes its output with its corresponding transfer function shown in the same array. In the case of bias, it is encoded in the component with and .
4.3. Transfer Functions
The TF are represented in the component with and . The transfer functions are in the range of representing one of the six transfer functions selected in this work.
Although there are several transfer functions that can be used in the ANN context, in this work the most popular and useful transfer functions in several kinds of problems are selected. The transfer functions in this research with their labels to identify them are Sigmoid function (LS), hyperbolic tangent function (HT), sinusoidal function (SN), Gaussian function (GS), linear function (LN), and hard limit function (HL).
4.4. ANN Output
Once decoded the information from the individual is necessary to know its efficiency to be evaluated with any of the fitness functions. To do this, it is necessary to calculate the output of the ANN designed during the training stage and generalization stage. This output is calculated using Algorithm 2, where is the output of the neuron , is the input pattern that feeds the ANN, is the dimensionality of the input pattern, is the dimensionality of the desired pattern, and is the output of the ANN.
5. Proposed Fitness Functions
Each individual must be selected based on their fitness, and the best solution is taken depending on the evaluation (performance) of each individual. In this work, we propose eight different fitness functions to design an ANN. It is important to remark that fitness functions only are used during the training stage to evaluate each solution. After designing the ANN, we use a new metric that allows us to compare efficiently the results provided by the ANN generated with the proposed methodology.
5.1. Mean Square Error
The mean square error (MSE) represents the error between the ANN output and the desire patterns. In this case, the best individual is the one which generates the minimum MSE (see the following equation):where is the output of the ANN.
5.2. Classification Error
The classification error (CER) is calculated as follows: the output of the ANN is transformed into binary codification by means of the winner-take-all technique. The binary chain must have only a number 1 and the rest is composed of 0s. This indicates that the position with 1 is the class to which the input pattern belongs. This binary chain is compared against the desire pattern, if they are equal the classification was done correctly.
In this case, the best ANN is the one which generates the minimum wrong classified patterns. The CER is represented bywhere represents the number of patterns well classified and is the total of patterns to classify.
5.3. Validation Error
When the ANN is trained during a long period, the ANN could get a maximum learning in which the ANN becomes adept (overfitting). However, this has a disadvantage because if the input data during the testing stage are contaminated with a negligible amount of noise, the ANN will not be able to recognize new patterns.
For that reason, we need to include a validation phase to prevent overfitting and thus guarantee an adequate generalization. Therefore, we designed a fitness function that integrates the assessment of both the training and validation stages.
Based on this idea, two fitness functions were generated: the first evaluates the mean square error (MSE) on the training set and the MSE on the validation set ; see (18). The second function takes into account both the classification error (CER) on the training set and the classification error on the validation set ; see (19):
In order to evaluate the fitness of each solution using (18) and (19), it is necessary to first computed the or using the training set; after that, the or using the validation set is computed. It is important to notice that the error achieved with the validation set is more weighted than the error obtained with the training set.
5.4. Reduction of the Architecture
In order to generate a smaller ANN in terms of the number of connections, it is necessary to design a fitness function that takes into account the performances of the ANN in terms of the MSE or CER as well as a factor related to the number of connections used in the ANN.
In that sense, we proposed the following equation for computing the factor that allows us to measure the size of the ANN in terms of the number of connections:where represents the number of connections when the proposed methodology is applied and represents the maximum number of connection that an ANN can generate which is computed as inwhere is the maximum number of neurons.
It is important to mention that not necessarily less or more connections generate a better performance; however, by using factor , it is possible to weight other metrics that can measure the performance of the ANN and find the ANN with less connections with an acceptable performance.
In that sense, we proposed two new fitness functions in terms of the MSE function equation (22) and in terms of the CER function equation (23). These fitness functions tend to the global minimum when the factor and the performance are small; however, when one of these terms tends to increase, the fitness function tends to move away from the global minimum:
5.5. Architecture Reduction and Validation Error with MSE and CER Errors
At last, two fitness functions and were generated: the first reduces simultaneously the architecture, the validation error, and the MSE; see (24). The second function reduces the architecture, the validation error, and the CER equation (25):
6. Tuning the Parameters for PSO Algorithms
Ten classification problems of different complexity were selected to evaluate the accuracy of the methodology: Iris plant, wine, breast cancer, diabetes, and liver disorder datasets which were taken from the UCI machine learning benchmark repository . The object recognition problem was taken from , and the spiral, synthetic 1, and synthetic 2 datasets were developed in our laboratory. The pattern dispersions of these datasets are shown in Figure 2.
(b) Synthetic 1
(c) Synthetic 2
Table 1 shows the description for each classification problem.
Each dataset was randomly divided into three sets for training, testing, and validating the ANN as follows: 33% of the total patterns for the training stage, 33% for validation stage, and 34% for testing stage.
After that, the best parameter values for each algorithm were found to obtain the best performance for each classification problem. Then, the best configuration for each algorithm was used to validate statistically the accuracy of the ANN.
To determine which parameters generate the best ANN in terms of its accuracy, it is necessary to analyze training and testing performance. Although the accuracy of the ANN should be measured in terms of the testing performance, it is also important to consider the performance that achieves the ANN during the training stage, in order to find the parameters that provoke the best results during training and testing stages. Instead of analyzing the training and testing performances separately, we proposed a new metric that let us consider the accuracy of the ANN during training and testing stages. This metric allows us to weight the testing performance to validate the accuracy of the proposal and, at the same time, to have the confidence that training stage was done with an acceptable accuracy. This metric computed a weighted recognition rate () and it is described inwhere represents the recognition rate obtained during the training stage and represents the recognition rate obtained during the testing stage.
From (26), we could observe that testing and training stages were weighted by a factor of 0.6 and 0.4, respectively. Using these factors, we can avoid that high value may be obtained by a higher training recognition rate and a lower testing recognition rate.
The analysis to select the best values of each algorithm was performed taking into account the ten classification problems described above. The different parameters for each algorithm were varied in different ranges to evaluate the performance of the algorithms over different pattern recognition problem. In order to find the best configuration for the parameters of each algorithm, several experiments were done assigning different values to each parameter in the three bioinspired algorithms (original PSO, SGPSO, and NMPSO).
The parameters were divided into two types: the parameters that are shared or common to all algorithms, such as the number of generations, the number of individuals, the range of variables, and the fitness function. The specific parameters are those that are unique or specific to each algorithm, for example, for the basic PSO algorithm, inertia and the two coefficients of acceleration and are the parameters that change. In the case of SGPSO algorithm takes two parameters, the coefficient of acceleration and the geometric center . Finally, the NMPSO algorithm has the crossover operator , the mutation operator , and which determine when each neighborhood should be updated.
For each parameter configuration and each problem 5 experiments with 2000 generations were performed. Once the ANNs were designed with the proposed methodology, the average weighted recognition rate was obtained.
Next is described which values were taken for each parameter to obtain the best configuration for each bioinspired algorithm.
The common parameters for the three algorithms are represented as follows: for the population size, in the variable the first element corresponds to 50 individuals and the second corresponds to 100 individuals. In the case of the search space size the first element indicates that the range is set to and the second item indicates that the range is between . The type of fitness function used with the bioinspired algorithm is represented by the variable and can take one of the eight elements, .
All the possible combinations using the different parameter values were tested. The eighth fitness function was tested using all the classification problems proposed in this research to see which provides the best accuracy.
The configuration to determine the value for each parameter for original PSO is determined by the following sequence: .
The basic PSO algorithm has three unique parameters: the inertia weight represented by which can take the following values and the two acceleration coefficients and represented by and , respectively, with the values . Once we finished the set of experiments to test the performance of the original (basic) PSO algorithm with all the previous values combinations, we found that the best parameter configuration was .
SGPSO algorithm has two unique parameters: the acceleration coefficient represented by the variable whose values are and the geometric center represented by the variable with values . In the case of the acceleration coefficients and , it took the best values found for the basic PSO algorithms: , , and . After several experiments, the best parameter configuration for SGPSO was .
The NMPSO algorithm has three unique parameters: the updating neighborhood rate which takes the values , the crossover factor , and the mutation factor , which are represented by the variables and ; both take the values . The best parameter configuration found for NMPSO was .
7. Experimental Results
Once we determined the best configuration for each algorithm, we performed an exhaustive testing of 30 runs for each pattern classification problem. The accuracy of the ANN generated by the methodology was measured in terms of the weighted recognition rate (26). The following subsections describe the results obtained for each database and each bioinspired algorithm. These experiments show the evolution of the fitness function during 5000 generations, the weighted recognition rate, and some examples of the architectures generated with the methodology.
7.1. Results for Basic PSO Algorithm
In Figure 3 are shown some of the ANNs generated using the PSO algorithm that provide the best results for the recognition problem.
Figure 4(a) showed the evolution of the fitness function where we can appreciate the tendency for each classification problem. These results were obtained with the best configuration of basic PSO.
(a) Evolution of fitness function (average)
(b) Accuracy of the ANN (average)
The evolution of the fitness function represents the average of the 30 experiments for each problem. It is observed that the value of the fitness function for the glass, spiral, liver disorders, diabetes, and synthetic 2 problems slightly decreases despite the number of generations. Smaller values for the fitness function were achieved with the Iris plant, breast cancer, and synthetic 1 problems. With the object recognition and wine problems, the value of the fitness function decreased when approaching the limit of generations. The average weighted recognition rate for each problem is presented in Figure 4(b). It can be observed that, for the glass problem, the ANN achieved the smallest average weighted recognition rate (52.67%), followed by the spiral (53.39%), liver disorders (68.74%), diabetes (76.90%), object recognition (80.22%), synthetic 2 (82.96%), and wine (86.49%). The highest average weighted recognition rates were achieved for the synthetic 1 (95.03%), the Iris (96.35%), and the breast cancer (96.99%).
Table 2 presents the frequency at which the six different transfer functions were selected for the ANN during the training stage. Applying the PSO algorithm, we see that there is a small range of selected functions. For example, the sinusoidal function was selected more often for the spiral, synthetic 1, and synthetic 2 problems. The Gaussian transfer function was selected more often for Iris plant, breast cancer, diabetes, liver disorders, object recognition, wine, and glass problems.
Table 3 shows the maximum, minimum, standard deviation, and average number of connections used by the ANN. As you can see, in average, the number of connections is low for the problems of spiral, synthetic 1, and synthetic 2. For the glass and wine, in average, 97.43 and 91.1 connections were used, respectively.
Table 4 shows the maximum, minimum, standard deviation, and average the number of neurons used in the ANN generated with the proposed method. In this table, we can see that the number of neurons in the ANN for the ten classification problems was no more than 13.
7.2. Results for SGPSO Algorithm
In Figure 5 are shown some of the best ANNs generated with the SGPSO algorithm. You can also observe an example of an ANN with a input neuron without any connection; see Figure 5(c). The lack of connection in the ANN indicates that the input feature was not necessary to solve the problem. In other words, a dimensionality reduction of the input pattern was also done by the proposed methodology.
Figure 6(a) shows the evolution of the fitness function where we can see the tendency of the fitness function for each classification problem. These results were obtained with the best parameter configuration for the SGPSO algorithm. In general, the problems whose values are near to the optimal solution are the breast cancer, Iris plant, and synthetic 1, being in last place with high errors the liver disorders, glass, and spiral problems.
(a) Evolution of fitness function (average)
(b) Accuracy of the ANN (average)
The average weighted recognition rate for each problem is presented in Figure 6(b). It was observed that for the glass problem the proposed methodology achieved the smallest weighted recognition rate (54.31%), followed by the spiral (55.60%), liver disorders (69.19%), diabetes (76.09%), object recognition (80.45%), synthetic 2 (81.39%), wine (82.47%), and synthetic 1 (93.61%). The second highest weighted recognition rate was achieved for the Iris plant (96.45%). The highest weighted recognition rate was achieved for the breast cancer problem (97.03%).
Table 5 presents the number of times that transfer functions were selected using the SGPSO algorithm. The sinusoid function was the most selected by 9 of the 10 classification problems: spiral, synthetic 1 and synthetic 2, Iris plant, diabetes, liver disorders, object recognition, wine, and glass problems. For the breast cancer problem, sinusoid function was selected almost at the same rate as the Gaussian function.
Furthermore, Table 6 shows the maximum, minimum, standard deviation, and average number of connections used by the ANN designed with the proposed methodology. In this case, SGPSO generates more connections between neurons of the ANN for the ten classification problems than those generated with the basic PSO algorithm.
Table 7 shows the maximum, minimum, standard deviation, and average number of neurons required for the ANN using SGPSO algorithm.
7.3. Results for NMPSO Algorithm
Figure 7 shows some of the best ANNs generated with the NMPSO algorithm. The fitness function used with the NMPSO algorithm was function.
The evolution of the fitness function for the 10 classification problems is shown in Figure 8(a) where it is observed that the minimum values are reached with the synthetic 1, breast cancer, and Iris plant problems. For the case of wine problem the value of the fitness function improves while the generation’s number increased. The worst case was observed for the glass problem.
(a) Evolution of fitness function (average)
(b) Accuracy of the ANN (average)
The weighted recognition rate for each problem is shown in Figure 8(b). From this graph, we observed that the average weighted recognition rate for the glass problem was 54.06%, for the spiral problem 62.97% and for liver disorders it achieved 70.01%, the diabetes problem 76.89%, the object recognition problem 85.73%, and synthetic problem 2 86.30%. The best recognition rate was achieved with the wine problem (88.62%), Iris plant (96.60%), breast cancer (97.11%), and synthetic 1 (97.42%).
The number of times that the transfer functions were selected using NMPSO algorithm is described in Table 8. Using the sinusoidal function, the ANNs provide better results for the spiral, synthetic problem 1, synthetic problem 2, and the object recognition problem. For the the Iris plant, breast cancer, diabetes, liver disorders, wine, and glass problems the Gaussian function was the most selected.
In general, the transfer function most often selected using NMPSO algorithm was the Gaussian, second sinusoidal function, then the hyperbolic tangent, next the linear function, and the last places the sigmoid and hard limit functions. Table 9 shows the maximum, minimum, standard deviation, and average connections number.
In Table 10 are shown the maximum, minimum, standard deviation, and the average number of neurons used by the ANN generated with the NMPSO algorithm.
8. General Discussion
In general, Table 11 shows a summary of results taking into account the average weighted recognition rate obtained with the three bioinspired algorithms.
For the cases of the spiral, synthetic 1, Iris plant, breast cancer, liver disorders, object recognition, and wine problems the algorithm providing better results was the NMPSO algorithm. For the glass problem the best accuracy was achieved with SGPSO algorithm and for the case of diabetes the best performance was achieved using the basic PSO algorithm.
From Table 11, it is possible to see that the best algorithm, in terms of the weighted recognition rate, was NMPSO (81.57%), the second best algorithm was basic PSO (78.97%), and the last was SGPSO algorithm (78.65%) for the ten classification problems.
Moreover, these results were compared with results obtained from classic algorithms such as the gradient descent and Levenberg-Marquardt. Due to the classic techniques needing a specific architecture, it was proposed to design manually two kinds of ANN. The first consists of one hidden layer and the second consists of two hidden layers.
To determine the maximum number of neurons used to generate the ANN we follow the same rule proposed in the methodology. For the ANN with two hidden layers, there was a pyramidal distribution usingwhere the first hidden layer has the 60% of the total hidden layers and the second hidden layer has the 40% of the total hidden layers.
Two stop criteria for the gradient descent and Levenberg-Marquardt algorithms were established: until the algorithm reach 5000 epochs or until reach an error of 0.000001. The classification problems were divided into three subsets: 40% of the overall patterns were used for training, 50% for generalization, and 10% for validation. The learning rate was set to 0.1.
In Table 12 is shown the average weighted recognition rate using the classic training algorithms: one based on gradient descent (backpropagation algorithm) and the other based on the Levenberg-Marquardt algorithm. From this set of experiments, we observed that the best algorithm was Levenberg-Marquardt with a single layer. This algorithm solved eight of ten problems with the best performance (spiral, synthetic 1, synthetic 2, Iris plant, breast cancer, diabetes, liver disorders, and object recognition). For the case of the wine problem, the best algorithm was the gradient descent algorithm composed of one single layer. The glass problem was solved better using Levenberg-Marquardt with two hidden layers.
Considering Tables 12 and 11, the best techniques to design ANN were the NMPSO algorithm followed by the Levenberg-Marquardt with one hidden layer. On the other hand, the basic PSO and SGPSO algorithms as well as the gradient descend and Levenberg-Marquardt with two layers did not provide a good performance.
Besides that Levenberg-Marquardt obtained better results than PSO and SGPSO algorithms, there are some important points to consider: first, the ANN designed with the proposed methodology includes the selection of the architecture, synaptic weights, bias, and transfer functions. For the case of classic techniques, the architectures must be carefully and manually designed by an expert in order to obtain the best results; this process can be a time-consuming task for the expert. On the opposite side, the proposed methodology automatically designs the ANN in terms of the input and desire patterns that codified the problem to be solved.
In this paper, we proposed three connection rules for generating feed-forward ANN and guiding the connections between neurons. These rules allow connections among neurons from the input layer to the output layer. These rules also allow to generate lateral connections among neurons from the same layer.
We also observed that some ANNs designed by the proposed methodology do not have any connection from the input neurons. It means that the feature associated to this neuron was not relevant to compute the output of ANN. This is known as dimensionality reduction of the input pattern.
Eight transfer functions, which involve the combination of the MSE, CER validation error, and architecture reduction (of connections and neurons), were implemented to evaluate each individual. From these experiments, we observed that the fitness functions that generated the ANN with the best weighted recognition rate were those that used the classification error CER. The three bioinspired algorithms based on PSO were compared in terms of the average weighted recognition rate.
On the other hand, the NMPSO algorithm achieved the best performance followed by the basic PSO and SGPSO algorithm.
To validate statistically the accuracy of the proposed methodology, first of all, the parameters for the three bioinspired algorithms were selected. For the case of basic PSO the best fitness function selected was with a variable range between . After tuning the parameters of each algorithm and choosing the best configuration, we observe that the parameters were different from those proposed in the literature; these values for the parameters were set to , , and . For the SGPSO algorithm, the best fitness function selected was with a variable range between . The values for the parameters were set to and the geometric centre . For the NMPSO algorithm, the best fitness function was with a variable range between . The parameters for the best configuration were set to , crossover rate , and mutation rate .
After tuning the parameters of the three algorithms, 30 runs were performed for each of the ten classification problems. In general, whereas the problems that achieved a weighted recognition rate of 100% were the synthetic problem 1, Iris plant, and object recognition problems, a lower performance was obtained with the glass and spiral problems.
The transfer functions that more often were selected for each algorithm were: the Gaussian function for the basic PSO algorithm, the sinusoidal function for SGPSO algorithm and the Gaussian function for NMPSO algorithm.
In general, the ANNs designed with the proposed methodology were very promising. The proposed methodology automatically designs the ANN based on determining the set connections, the number of neurons in hidden layers, the adjustment of the synaptic weights, the selection of bias, and transfer function for each neuron.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thank Universidad La Salle for the economic support under Grants number I-61/12. Beatriz Garro thanks CONACYT and UNAM for the posdoctoral scholarship.
- G. Beni and J. Wang, “Swarm intelligence in cellular robotic systems,” in Robots and Biological Systems: Towards a New Bionics? vol. 102 of NATO ASI Series, pp. 703–712, Springer, Berlin, Germany, 1993.
- X. Yao, “Evolving artificial neural networks,” Proceedings of the IEEE, vol. 87, no. 9, pp. 1423–1447, 1999.
- E. Alba and R. Martí, Metaheuristic Procedures for Training Neural Networks, Operations Research/Computer Science Interfaces Series, Springer, New York, NY, USA, 2006.
- J. Yu, L. Xi, and S. Wang, “An improved particle swarm optimization for evolving feedforward artificial neural networks,” Neural Processing Letters, vol. 26, no. 3, pp. 217–231, 2007.
- M. Conforth and Y. Meng, “Toward evolving neural networks using bio-inspired algorithms,” in IC-AI, H. R. Arabnia and Y. Mun, Eds., pp. 413–419, CSREA Press, 2008.
- Y. Da and G. Xiurun, “An improved PSO-based ANN with simulated annealing technique,” Neurocomputing, vol. 63, pp. 527–533, 2005.
- X. Yao and Y. Liu, “A new evolutionary system for evolving artificial neural networks,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 694–713, 1997.
- D. Rivero and D. Periscal, “Evolving graphs for ann development and simplification,” in Encyclopedia of Artificial Intelligence, J. R. Rabuñal, J. Dorado, and A. Pazos, Eds., pp. 618–624, IGI Global, 2009.
- H. M. Abdul-Kader, “Neural networks training based on differential evolution algorithm compared with other architectures for weather forecasting34,” International Journal of Computer Science and Network Security, vol. 9, no. 3, pp. 92–99, 2009.
- K. K. Kuok, S. Harun, and S. M. Shamsuddin, “Particle swarm optimization feedforward neural network for modeling runoff,” International Journal of Environmental Science and Technology, vol. 7, no. 1, pp. 67–78, 2010.
- B. A. Garro, H. Sossa, and R. A. Vázquez, “Back-propagation vs particle swarm optimization algorithm: which algorithm is better to adjust the synaptic weights of a feed-forward ANN?” International Journal of Artificial Intelligence, vol. 7, no. 11, pp. 208–218, 2011.
- B. Garro, H. Sossa, and R. Vazquez, “Evolving neural networks: a comparison between differential evolution and particle swarm optimization,” in Advances in Swarm Intelligence, Y. Tan, Y. Shi, Y. Chai, and G. Wang, Eds., vol. 6728 of Lecture Notes in Computer Science, pp. 447–454, Springer, Berlin, Germany, 2011.
- B. A. Garro, H. Sossa, and R. A. Vazquez, “Design of artificial neural networks using a modified particle swarm optimization algorithm,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '09), pp. 938–945, IEEE, Atlanta, Ga, USA, June 2009.
- B. Garro, H. Sossa, and R. Vazquez, “Design of artificial neural networks using differential evolution algorithm,” in Neural Information Processing. Models and Applications, K. Wong, B. Mendis, and A. Bouzerdoum, Eds., vol. 6444 of Lecture Notes in Computer Science, pp. 201–208, Springer, Berlin, Germany, 2010.
- B. A. Garro, H. Sossa, and R. A. Vazquez, “Artificial neural network synthesis by means of artificial bee colony (abc) algorithm,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '11), pp. 331–338, IEEE, New Orleans, La, USA, June 2011.
- R. C. Eberhart, Y. Shi, and J. Kennedy, Swarm Intelligence, The Morgan Kaufmann Series in Evolutionary Computation, Morgan Kaufmann, Boston, Mass, USA, 1st edition, 2001.
- M. Chen, “Second generation particle swarm optimization,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '08), pp. 90–96, Hong Kong, June 2008.
- Y. Shi and R. C. Eberhart, “Empirical study of particle swarm optimization,” in Proceedings of the Congress on Evolutionary Computation (CEC '99), vol. 3, IEEE, Washington, DC, USA, July 1999.
- M. Løvberg, T. K. Rasmussen, and T. Krink, “Hybrid particle swarm optimizer with breeding and subpopulations,” in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '01), pp. 469–476, Morgan Kaufmann, San Francisco, Calif, USA, July 2001.
- N. Higashi and H. Iba, “Particle swarm optimization with Gaussian mutation,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '03), pp. 72–79, IEEE, April 2003.
- A. S. Mohais, R. Mohais, C. Ward, and C. Posthoff, “Earthquake classifying neural networks trained with random dynamic neighborhood PSOs,” in Proceedings of the 9th Annual Genetic and Evolutionary Computation Conference (GECCO '07), pp. 110–117, ACM, New York, NY, USA, July 2007.
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, D. E. Rumelhart, J. L. McClelland, and PDP Research Group, Eds., pp. 318–362, MIT Press, Cambridge, Mass, USA, 1986.
- J. A. Anderson, An Introduction to Neural Networks, The MIT Press, 1995.
- P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
- D. N. A. Asuncion, UCI machine learning repository, 2007.
- R. A. V. E. De Los Monteros and J. H. Sossa Azuela, “A new associative model with dynamical synapses,” Neural Processing Letters, vol. 28, no. 3, pp. 189–207, 2008.
Copyright © 2015 Beatriz A. Garro and Roberto A. Vázquez. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.