Scientific Programming

Volume 2018, Article ID 2943290, 7 pages

https://doi.org/10.1155/2018/2943290

## Developing a Novel Hybrid Biogeography-Based Optimization Algorithm for Multilayer Perceptron Training under Big Data Challenge

Correspondence should be addressed to Le Zhang; nc.ude.ucs@60elgnahz

Received 24 August 2017; Revised 8 December 2017; Accepted 18 January 2018; Published 1 March 2018

Academic Editor: Anfeng Liu

Copyright © 2018 Xun Pu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A Multilayer Perceptron (MLP) is a feedforward neural network model consisting of one or more hidden layers between the input and output layers. MLPs have been successfully applied to solve a wide range of problems in the fields of neuroscience, computational linguistics, and parallel distributed processing. While MLPs are highly successful in solving problems which are not linearly separable, two of the biggest challenges in their development and application are the local-minima problem and the problem of slow convergence under big data challenge. In order to tackle these problems, this study proposes a Hybrid Chaotic Biogeography-Based Optimization (HCBBO) algorithm for training MLPs for big data analysis and processing. Four benchmark datasets are employed to investigate the effectiveness of HCBBO in training MLPs. The accuracy of the results and the convergence of HCBBO are compared to three well-known heuristic algorithms: (a) Biogeography-Based Optimization (BBO), (b) Particle Swarm Optimization (PSO), and (c) Genetic Algorithms (GA). The experimental results show that training MLPs by using HCBBO is better than the other three heuristic learning approaches for big data processing.

#### 1. Introduction

The term big data [1–3] had been developed to describe the phenomenon of the increasing size of massive datasets in scientific experiments, financial trading, and networks. Since big data is always of big volume and has multiple varied types and fast update velocity [4], it is urgent for us to develop such a tool that can extract the meaningful information from big data. Neural networks (NNs) [5, 6] are one of popular machine learning computational approaches, which are composed of several simple and interconnected processing elements and good at loosely modeling the neuronal structures of the human brain. A neural network can be represented as a highly complex nonlinear dynamic system [5], which has some unique characteristics: (a) high dimensionality, (b) extensive interconnectivity, (c) adaptability, and (d) ability to self-organize.

In the last decade, feedforward neural networks (FNNs) [6] have gained popularity in various areas of machine learning [7] and big data mining [1] to solve classification and regression problems. While the two-layered FNN is the most popular neural network used in practical applications, it is not suitable for solving nonlinear problems [7, 8]. The Multilayer Perceptron (MLP) [9, 10], a feedforward neural network with one or more hidden layers between the input and the output layers, is more successful in dealing with nonlinear problems such as pattern classification, big data prediction, and function approximation. Previous research [11] shows that MLPs with one hidden layer are able to approximate any continuous or discontinuous function. Therefore, the study of MLPs with one hidden layer has gained a lot of attention from the research community.

Theoretically, the goal of the learning process of MLPs is to find the best combination of weights and biases of the connections in order to achieve minimum error for the given train and test data. However, one of the most common problems of training an MLP is that there is a tendency for the algorithm to converge on a local minimum. Since an MLP can consist of multiple local minima, it is easy to be trapped in one of them rather than converging on the global minimum. This is a common problem in most gradient-based learning approaches such as backpropagation (BP) based NNs [12]. According to Mirjalili’s research [13], the initial values of the learning rate and the momentum can also affect the convergence in case of BP based NNs, with unsuitable values for these variables resulting in their divergence. Thus, many studies focus on using novel heuristic optimization methods or evolutionary algorithms to resolve the problems of MLP learning algorithms [14]. Classical applied approaches are Particle Swarm Optimization (PSO) algorithms [15, 16], Ant Colony Optimization (ACO) [17], and Artificial Bee Colony (ABC) [18]. However, the No Free Lunch (NFL) theorem [19, 20] states that no heuristic algorithm is best suited for solving all optimization problems. Most of them have their own side effects and overall there has been no significant improvement [13] using these approaches. For example, Genetic Algorithms (GA) may reduce the probability of getting trapped in a local minimum, but they still suffer from slow convergence rates.

Recently, a novel optimization method called Biogeography-Based Optimization (BBO) [21] has been proposed. It is based on the motivation that geographical distribution of biological organisms can be represented by mathematical equations. It is a distributed paradigm, which seeks to simulate the collective behavior of unsophisticated individuals interacting locally with their environment to efficiently identify optimum solutions in complex search spaces. There are many related works of research [22–25] which show that the BBO algorithm is a type of evolutionary algorithm which can offer a specific evolutionary mechanism for each individual in a population. This mechanism makes the BBO algorithm more successful and robust on nonuniform training procedures than gradient-based algorithms. Moreover, compared with the PSO or ACO, the mutation operator of the BBO algorithm can enhance their exploitation capability. This allows the BBO algorithm to outperform PSOs in training MLPs. This has led to a great interest in applying the efficiency of BBO in training MLPs. In 2010, Ovreiu and Simon [24] trained a neuro-fuzzy network with BBO for classifying P-wave features for the diagnosis of cardiomyopathy. Research [13] used 11 standard datasets to provide a comprehensive test bed for investigating the abilities of the BBO algorithm in training MLPs. In this paper, we propose a hybrid BBO with chaotic maps trainer (HCBBO) for MLPs. Our approach employs chaos theory to improve the performance of the BBO with very little computational burden. In our algorithm, the migration and mutation mechanisms are combined to enhance the exploration and exploitation abilities of BBO, and a novel migration operator is proposed to improve BBO’s performance in training MLPs.

The rest of this paper is organized as follows. In Section 2, a brief review of the MLP notation and a simple first-order training method are provided. In Sections 3 and 4, the HCBBO framework is introduced and analyzed. In Section 5, the computational results to demonstrate the effectiveness of the proposed improved hybrid algorithm are provided. Finally, Section 6 provides concluding remarks and suggests some directions for future research.

#### 2. Review of the MLP Notation

The notation used in the rest of the paper represents a fully connected feedforward MLP network with a single hidden layer (as shown in Figure 1). This MLP consists of an input layer, an output layer, and a single hidden layer. The MLP is trained using a backpropagation (BP) learning algorithm. Let denote the number of input nodes, denote the number of hidden nodes, and denote the number of output nodes. Let the input weights connect the input to the hidden unit and output weights connect the hidden unit to the output. The weighted sums of inputs are first calculated by the following equation:where is the number of the input nodes, is the connection weight from the node in the input layer to the node in the hidden layer, indicates the input, and means the threshold of the hidden node.