Abstract

A Multilayer Perceptron (MLP) is a feedforward neural network model consisting of one or more hidden layers between the input and output layers. MLPs have been successfully applied to solve a wide range of problems in the fields of neuroscience, computational linguistics, and parallel distributed processing. While MLPs are highly successful in solving problems which are not linearly separable, two of the biggest challenges in their development and application are the local-minima problem and the problem of slow convergence under big data challenge. In order to tackle these problems, this study proposes a Hybrid Chaotic Biogeography-Based Optimization (HCBBO) algorithm for training MLPs for big data analysis and processing. Four benchmark datasets are employed to investigate the effectiveness of HCBBO in training MLPs. The accuracy of the results and the convergence of HCBBO are compared to three well-known heuristic algorithms: (a) Biogeography-Based Optimization (BBO), (b) Particle Swarm Optimization (PSO), and (c) Genetic Algorithms (GA). The experimental results show that training MLPs by using HCBBO is better than the other three heuristic learning approaches for big data processing.

1. Introduction

The term big data [13] had been developed to describe the phenomenon of the increasing size of massive datasets in scientific experiments, financial trading, and networks. Since big data is always of big volume and has multiple varied types and fast update velocity [4], it is urgent for us to develop such a tool that can extract the meaningful information from big data. Neural networks (NNs) [5, 6] are one of popular machine learning computational approaches, which are composed of several simple and interconnected processing elements and good at loosely modeling the neuronal structures of the human brain. A neural network can be represented as a highly complex nonlinear dynamic system [5], which has some unique characteristics: (a) high dimensionality, (b) extensive interconnectivity, (c) adaptability, and (d) ability to self-organize.

In the last decade, feedforward neural networks (FNNs) [6] have gained popularity in various areas of machine learning [7] and big data mining [1] to solve classification and regression problems. While the two-layered FNN is the most popular neural network used in practical applications, it is not suitable for solving nonlinear problems [7, 8]. The Multilayer Perceptron (MLP) [9, 10], a feedforward neural network with one or more hidden layers between the input and the output layers, is more successful in dealing with nonlinear problems such as pattern classification, big data prediction, and function approximation. Previous research [11] shows that MLPs with one hidden layer are able to approximate any continuous or discontinuous function. Therefore, the study of MLPs with one hidden layer has gained a lot of attention from the research community.

Theoretically, the goal of the learning process of MLPs is to find the best combination of weights and biases of the connections in order to achieve minimum error for the given train and test data. However, one of the most common problems of training an MLP is that there is a tendency for the algorithm to converge on a local minimum. Since an MLP can consist of multiple local minima, it is easy to be trapped in one of them rather than converging on the global minimum. This is a common problem in most gradient-based learning approaches such as backpropagation (BP) based NNs [12]. According to Mirjalili’s research [13], the initial values of the learning rate and the momentum can also affect the convergence in case of BP based NNs, with unsuitable values for these variables resulting in their divergence. Thus, many studies focus on using novel heuristic optimization methods or evolutionary algorithms to resolve the problems of MLP learning algorithms [14]. Classical applied approaches are Particle Swarm Optimization (PSO) algorithms [15, 16], Ant Colony Optimization (ACO) [17], and Artificial Bee Colony (ABC) [18]. However, the No Free Lunch (NFL) theorem [19, 20] states that no heuristic algorithm is best suited for solving all optimization problems. Most of them have their own side effects and overall there has been no significant improvement [13] using these approaches. For example, Genetic Algorithms (GA) may reduce the probability of getting trapped in a local minimum, but they still suffer from slow convergence rates.

Recently, a novel optimization method called Biogeography-Based Optimization (BBO) [21] has been proposed. It is based on the motivation that geographical distribution of biological organisms can be represented by mathematical equations. It is a distributed paradigm, which seeks to simulate the collective behavior of unsophisticated individuals interacting locally with their environment to efficiently identify optimum solutions in complex search spaces. There are many related works of research [2225] which show that the BBO algorithm is a type of evolutionary algorithm which can offer a specific evolutionary mechanism for each individual in a population. This mechanism makes the BBO algorithm more successful and robust on nonuniform training procedures than gradient-based algorithms. Moreover, compared with the PSO or ACO, the mutation operator of the BBO algorithm can enhance their exploitation capability. This allows the BBO algorithm to outperform PSOs in training MLPs. This has led to a great interest in applying the efficiency of BBO in training MLPs. In 2010, Ovreiu and Simon [24] trained a neuro-fuzzy network with BBO for classifying P-wave features for the diagnosis of cardiomyopathy. Research [13] used 11 standard datasets to provide a comprehensive test bed for investigating the abilities of the BBO algorithm in training MLPs. In this paper, we propose a hybrid BBO with chaotic maps trainer (HCBBO) for MLPs. Our approach employs chaos theory to improve the performance of the BBO with very little computational burden. In our algorithm, the migration and mutation mechanisms are combined to enhance the exploration and exploitation abilities of BBO, and a novel migration operator is proposed to improve BBO’s performance in training MLPs.

The rest of this paper is organized as follows. In Section 2, a brief review of the MLP notation and a simple first-order training method are provided. In Sections 3 and 4, the HCBBO framework is introduced and analyzed. In Section 5, the computational results to demonstrate the effectiveness of the proposed improved hybrid algorithm are provided. Finally, Section 6 provides concluding remarks and suggests some directions for future research.

2. Review of the MLP Notation

The notation used in the rest of the paper represents a fully connected feedforward MLP network with a single hidden layer (as shown in Figure 1). This MLP consists of an input layer, an output layer, and a single hidden layer. The MLP is trained using a backpropagation (BP) learning algorithm. Let denote the number of input nodes, denote the number of hidden nodes, and denote the number of output nodes. Let the input weights connect the input to the hidden unit and output weights connect the hidden unit to the output. The weighted sums of inputs are first calculated by the following equation:where is the number of the input nodes, is the connection weight from the node in the input layer to the node in the hidden layer, indicates the input, and means the threshold of the hidden node.

The output of each hidden node is calculated as follows:

After calculating outputs of the hidden nodes, the final output can be defined as follows:where is the connection weight from the hidden node to the output node and is the bias of the output node.

The learning error (fitness function) is calculated as follows:where is the number of training samples, is the number of outputs, is the desired output of the input unit when the training sample is used, and is the actual output of the input unit when the training sample is used.

From the above equations, it can be observed that the final value of the output in MLPs depends upon the parameters of the connecting weights and biases. Thus, training an MLP can be defined as the process of finding the optimal values of the weights and biases of the connections in order to achieve the desirable outputs from certain given inputs.

3. The Proposed Hybrid BBO for Training an MLP

Biogeography-Based Optimization (BBO) is a population-based optimization algorithm inspired by evolution and the balance of predators and preys in different ecosystems. Experiments show that results obtained using the BBO are at least competitive with other population-based algorithms. It has been shown to outperform some well-known heuristic algorithms such as PSO, GA, and ACO on some real-world problems and benchmark functions [21].

The steps of the BBO algorithm can be described as follows. In the beginning, the BBO generates a random number of search agents named habitats, which are represented as vectors of the variables in the problem (analogous to chromosomes in GA). Next, each agent is assigned emigration, immigration, and mutation rates which simulate the characteristics of different ecosystems. In addition, a variable called HSI (the habitat suitability index) is defined to measure the fitness of each habitat. Here, a higher value of HSI indicates that the habitat is more suitable for the residence of biological species. In other words, a solution of the BBO with a high value of HSI indicates a superior result, while a solution with a low value of HSI indicates an inferior result.

During the course of iterations, a set of solutions is maintained from one iteration to the next, and each habitat sends and receives habitants to and from different habitats based on their immigration and emigration rates which are probabilistically adapted. In each iteration, a random number of habitants are also occasionally mutated. That makes each solution adapt itself by learning from its neighbors as the algorithm progresses. Here, each solution parameter is denoted as a suitability index variable (SIV).

The process of BBO is composed of two phases: migration and mutation. During the migration phase, immigration and emigration rates of each habitat follow the model as depicted in Figure 2. A high number of habitants in a habitat increase the probability of emigration and decrease the probability of immigration. During the mutation phase, the mutation factor in BBO keeps the distribution of habitants in a habitat as diverse as possible. In contrast with the mutation factor in GA, the mutation factor of BBO is not set randomly; it is dependent on the probability of the number of species in each habitat.

The mathematical formula of immigration and emigration can be written as follows:where is the maximum immigration rate, is the maximum emigration rate, is the maximum number of habitants, and is the habitant count of .

The mutation of each habitat, which improves the exploration of BBO, is defined as follows:Here is the maximum value of mutation defined by user, is the greatest mutation probability of all the habitats, and is the mutation probability of the habitat, which can be obtained asThe complete process of the BBO algorithm is described in Algorithm 1; here initializes an ecosystem of habitats and computes each corresponding HSI and is a transition function which modifies the ecosystem from one optimization iteration to the next. The elements of the 6-tuple can be defined as follows: is the number of habitats; is the number of SIVs; is the immigration rate; is the emigration rate; is the migration operator; and is the mutation operator.

4. The Proposed Hybrid CBBO Algorithm for Training an MLP

There are three different approaches for using heuristic algorithms for training MLPs. In the first approach, heuristic algorithms are employed to find a combination of weights and biases to provide the minimum error for an MLP. In the second approach, heuristic algorithms are utilized to find the proper architecture for an MLP to be applied to a particular problem. In the third approach, heuristic algorithms can be used to tune the parameters of a gradient-based learning algorithm.

Mirjalili et al. [13] employed the basic BBO algorithm to train an MLP using the first approach, and the results demonstrate that BBO is significantly better at avoiding local minima compared to PSO, GA, and ACO algorithms. However, the basic BBO algorithm still has some drawbacks, such as (a) the large number of iterations needed to reach the global optimal solution and (b) the tendency to converge to solutions which may be locally the best. Many methods have been proposed to improve the capabilities for the exploration and exploitation of the BBO algorithm.

4.1. Chaotic Systems

Chaos theory [26] refers to the study of chaotic dynamical systems, which is embodied by the so-called “butterfly effect.” As nonlinear dynamical systems, chaotic systems are highly sensitive to their initial conditions, and tiny changes to their initial conditions may result in significant changes in the final outcomes of these systems.

In this paper, chaotic systems are applied to BBOs instead of random values [2527] for their initialization. This means that chaotic maps substitute the random values to provide chaotic behaviors to heuristic algorithms. During the processing of the BBO algorithm, the most important random values are calculated to choose a habitat for emigrating the new habitants during the migration phase. We utilize chaotic maps, which use the logistic model in (8), and choose a value from the interval of , whenever there is a need for a random value.here and are named logistic parameters. When equals 4, the iterations produce values which follow a pseudorandom distribution. This means that a tiny difference in the initial value of will give rise to a large difference in its long-time behavior. We employ this feature to avoid a local convergence of the BBO algorithm.

4.2. Habitat Suitability Index (Fitness Function)

During the training phase of an MLP, each training data sample should be involved in calculating the HIS of each candidate solution. In this work, the Mean Square Error (MSE) is utilized for evaluating all training samples. The MSE is defined as follows:here is the number of training samples, is the number of outputs, is the desired output of the input unit when the training sample is used, and is the actual output of the input unit when the training sample is used. Thus, the HSI value for the candidate is given by .

4.3. Opposition-Based Learning

To improve the convergence of BBO algorithm during the mutation phase, a method named opposition-based learning (OBL) has been used in [22]. The main idea of opposition-based learning is to consider an estimate and its opposite at the same time to achieve a better approximation of the current candidate solution.

Assuming that represents a vector of the weights and biases in the MLP, with and , then the definition of the opposite vector is with its elements as . The algorithm for the OBL method can be described as follows:

(1) Generate a vector and its opposite , in an -dimensional search space.

(2) Evaluate the fitness of both points, and .

(3) If , then replace with ; otherwise, continue with .

Thus, the vector and its opposite vector are evaluated simultaneously to obtain the fitter one.

4.4. Outline of HCBBO for MLP

In this section, the main procedure of HCBBO is described. To guarantee an initial population with a certain quality and diversity, the initial population is generated using a combination of the chaotic system and the OBL approach. By fusing the local search strategies with the migration and mutation phases of the BBO algorithm, the exploration and exploitation capabilities of the HCBBO can be well balanced. The main procedure of our proposed HCBBO to train an MLP can be described as Algorithm 2.

  input: habitat size , maximum migration rate and (emigration and immigration rate), the maximum mutation rate ;
Initialize set of MLPs (habitats) by chaos maps on formula Eq. (8);
For each habitat, calculate its mean square error by relative parameters based on formulas (9). And the basic rule of fitness
function is the better performance maintains the smaller value of MSE. Then elite habitats are identified by the values of
HSI.
Combing MLPs according to immigration and emigration rates based on Eq. (6)
Probabilistically use immigration and emigration to modify each non-elite habitat based on Eq. (7).
Select number of MLPs and recomputed (mutate) some of their weights or biases by chaos maps.
Save some of the MLPs with low MSE;
This loop will be terminated if a predefined number of generations are reached or an acceptable problem solution has been
found, otherwise go to step (3) for the next iteration.
  output: the MLP with minimum MSE (HSI).

5. Experimental Analysis

This study focuses on finding an efficient training method for MLPs. To evaluate the performance of the proposed HCBBO algorithm in this paper, a series of experiments were developed using the Matlab software environment (V2009). The system configuration is as follows: (a) CPU: Intel i7; (b) RAM: 4 GB; (c) operating system: Windows 8. Based on the works described in [13, 28, 29], we choose four publicly available classification big datasets to benchmark our system: (1) balloon, (2) iris, (3) heart, and (4) vehicle. All these datasets are freely available from the University of California at Irvine (UCI) Machine Learning Repository [30], thus ensuring replicability. And the characteristics of these datasets are listed in Table 1.

In this paper, we compare the performances of 4 algorithms, BBO, PSO, GA, and HCBBO, over the benchmark datasets described in Table 1. Since manually choosing appropriate parameters for each of these algorithms is time-consuming, the initial parameters and property structures for both the classical BBO algorithm and HCBBO algorithms (which were adjusted as Table 2) were chosen as in paper [13].

In order to increase the accuracy of the experiment, each algorithm was run 20 times, and different MLP structures will be used to deal with different datasets, which were listed in Table 3.

The running time (RT) and convergence curves of each algorithm are shown in Figures 37. From Figure 3, it can be observed that the average computational time of HCBBO is 8 to 13% lower than the best time obtained for the BBO. It is also lower than the computational time of all the other algorithms compared in this experiment. This decrease in the running time can be attributed to the fact that the HCBBO’s search ability was enhanced by OBL.

The convergence curves in Figures 47 show that, among all the algorithms, HCBBO has the fastest convergence behavior on all the datasets. In Figure 4, under the same experimental conditions, HCBBO achieved the optimal values for its parameters after 150 generations while BBO could not converge to an optimal value even after 200 generations. The same pattern in faster convergence for the HCBBO was observed for the other classical problems (Figures 57). Statistically speaking, HCBBO performs the best on all the classification datasets, since it is able to avoid local minima better than any other algorithm. And the classification results obtained by HCBBO are better than all other algorithms for the chosen datasets.

The experimental results of mean classification rate are provided in Table 4. Statistically speaking, HCBBO has the best results in all of the classification datasets because it avoids local minima better.

6. Discussion and Conclusions

In this paper, a HCBBO algorithm was presented for training an MLP. Four benchmark big datasets (balloon, iris, heart, and vehicle) were employed to investigate the effectiveness of HCBBO in training MLPs. The performance results were statistically compared with three state-of-the-art algorithms: BBO, PSO, and GA. The main contributions and innovations of this work are summarized as follows: (a) this is the first research work combining a hybrid chaos system with the BBO algorithm to train MLPs; (b) the method named OBL was used in the mutation operator of HCBBO to improve the convergence of the algorithm; and (c) the results demonstrate that HCBBO has better convergence capabilities than BBO, PSO, and GA. In the future, we will apply the trained neural networks to analyze the big medical data and integrate more novel data mining algorithms [29, 3135] into HCBBO.

Conflicts of Interest

The authors declare that they have no conflicts of interest.