Computational Intelligence and Neuroscience

Volume 2018, Article ID 4058403, 14 pages

https://doi.org/10.1155/2018/4058403

## SGB-ELM: An Advanced Stochastic Gradient Boosting-Based Ensemble Scheme for Extreme Learning Machine

^{1}School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou 730020, China^{2}College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China

Correspondence should be addressed to Jikui Wang; nc.ude.uzs@bewkjw

Received 11 December 2017; Revised 10 May 2018; Accepted 4 June 2018; Published 26 June 2018

Academic Editor: Pedro Antonio Gutierrez

Copyright © 2018 Hua Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A novel ensemble scheme for extreme learning machine (ELM), named Stochastic Gradient Boosting-based Extreme Learning Machine (SGB-ELM), is proposed in this paper. Instead of incorporating the stochastic gradient boosting method into ELM ensemble procedure primitively, SGB-ELM constructs a sequence of weak ELMs where each individual ELM is trained additively by optimizing the regularized objective. Specifically, we design an objective function based on the boosting mechanism where a regularization item is introduced simultaneously to alleviate overfitting. Then the derivation formula aimed at solving the output-layer weights of each weak ELM is determined using the second-order optimization. As the derivation formula is hard to be analytically calculated and the regularized objective tends to employ simple functions, we take the output-layer weights learned by the current pseudo residuals as an initial heuristic item and thus obtain the optimal output-layer weights by using the derivation formula to update the heuristic item iteratively. In comparison with several typical ELM ensemble methods, SGB-ELM achieves better generalization performance and predicted robustness, which demonstrates the feasibility and effectiveness of SGB-ELM.

#### 1. Introduction

Extreme learning machine (ELM) was proposed as a promising learning algorithm for single-hidden-layer feedforward neural networks (SLFN) by Huang [1–3], which randomly chooses weights and biases for hidden nodes and analytically determines the output-layer weights by using Moore-Penrose (MP) generalized inverse [4]. Due to avoiding the iterative parameter adjustment and time-consuming weight updating, ELM obtains an extremely fast learning speed and thus attracts a lot of attention. However, random initialization of input-layer weights and hidden biases might generate some suboptimal parameters, which have negative impact on its generalization performance and predicted robustness.

To alleviate such weakness, many works have been proposed to further improve the generalization capability and stability of ELM, where ELM ensemble algorithms are the representative ones. Three representative ELM ensemble algorithms are summarized as follows. The earliest ensemble based ELM (EN-ELM) method was presented by Liu and Wang in [5]. EN-ELM introduced the cross-validation scheme into its training phase, where the original training dataset was partitioned into subsets and then pairs of training and validation sets were obtained so that each training set consists of - subsets. Additionally, with updated input weights and hidden biases, individual ELMs were trained based on each pair of the training and validation set. There were totally ELMs that were constructed for decision-making in EN-ELM algorithm. Cao et al. [6] proposed a voting-based ELM (V-ELM) ensemble algorithm, which made the final decision based on the majority voting mechanism in classification applications. All the individual ELMs in V-ELM were trained on the same training dataset and the learning parameters of each basic ELM were randomly initialized independently. Moreover, a genetic ensemble of ELM (GE-ELM) method was designed by Xue et al. in [7], which used the genetic algorithm to produce optimal input weights as well as hidden biases for individual ELMs and selected ELMs equipped with not only higher fitness values but also smaller norm of output weights from the candidate networks. In GE-ELM, the fitness value of each individual ELM was evaluated based on the validation set which was randomly selected from the entire training dataset. There are still several other types of ELM ensemble algorithms which can be found in literatures [8–13].

As for ensemble of the traditional neural networks, the most prevailing approaches are Bagging and Boosting. In Bagging scheme [14], it generates several training datasets from the original training dataset and then trains a component neural network from each of those training datasets. Boosting mechanism [15] generates a series of component neural networks whose training datasets are determined by the performance of former ones. There are also many other approaches for training the component neural networks. Hampshire [16] utilizes different object functions to train distinct component neural networks. Xu et al. [17] introduce the stochastic gradient boosting ensemble scheme to bioinformatics applications. Yao et al. [18] regard all the individuals in an evolved population of neural networks as component networks.

In this paper, a new ELM ensemble scheme called Stochastic Gradient Boosting-based Extreme Learning Machine (SGB-ELM) which makes use of the mechanism of stochastic gradient boosting [19, 20] is proposed. SGB-ELM constructs an ensemble model by training a sequence of ELMs where the output weights of each individual ELM is learned by optimizing the regularized objective in an additive manner. More specifically, we design an objective based on the training mechanism of boosting method. In order to alleviate overfitting, we introduce a regularization item which controls the complexity of our ensemble model to the objective function concurrently. Then the derivation formula aimed at solving output weights of the newly added ELM is determined by optimizing the objective using second-order approximation. As the output weights of the newly added ELM at each iteration are hard to be analytically calculated based on the derivation formula, we take the output weights learned by the pseudo-residuals-based training dataset as an initial heuristic item and thus obtain the optimal output weights by using the derivation formula to update the heuristic item iteratively. Because the regularized objective tends to employ not only predictive but also simple functions and meanwhile a randomly selected subset rather than the whole training set is used to minimize training residuals at each iteration, SGB-ELM can continually improve the generalization capability of ELM while effectively avoiding overfitting. The experimental results in comparison with Bagging ELM, Boosting ELM, EN-ELM, and V-ELM show that SGB-ELM obtains better classification and regression performances, which demonstrates the feasibility and effectiveness of SGB-ELM algorithm.

The rest of this paper is organized as follows. In Section 2, we briefly summarize the basic ELM model as well as the stochastic gradient boosting method. Section 3 introduces our proposed SGB-ELM algorithm. Experimental results are presented in Section 4. Finally, we conclude this paper and make some discussions in Section 5.

#### 2. Preliminaries

In this section, we briefly review the principles of basic ELM model and the stochastic gradient boosting method to provide necessary backgrounds for the development of SGB-ELM algorithm in Section 3.

##### 2.1. Extreme Learning Machine

ELM is a special learning algorithm for SLFN, which randomly selects weights (linking the input layer to the hidden layer) and biases for hidden nodes and analytically determines the output weights (linking the hidden layer to the output layer) by using MP generalized inverse. Suppose we have a training dataset with instances , where and . It is known that for regression and for classification. In ELM, the input weights and hidden biases can be randomly chosen according to* any continuous probability distribution* [2]. Namely, we randomly select the learning parameters within the range of asandwhere is the number of hidden-layer nodes in SLFN. Depending on the theory proved in [2], the output-layer weights in ELM model can be analytically calculated byHere, is the MP generalized inverse of the hidden-layer output matrixwhere , , and is the sigmoid activation function, andis the target matrix. Generally, for an unseen instance , ELM predicts its output as follows:where is the hidden-layer output vector of .

Due to avoiding the iterative adjustment to input-layer weights and hidden biases, ELM’s training speed can be thousands of times faster than those of traditional gradient-based learning algorithms [2]. At the meantime, ELM also produces good generalization performance. It has been verified that ELM can achieve the equal generalization performance with the typical Support Vector Machine algorithm [3].

##### 2.2. Stochastic Gradient Boosting

Stochastic gradient boosting scheme was proposed by Friedman in [20], and it is a variant of the gradient boosting method presented in [19]. Given a training set , the goal is to learn a hypothesis that maps to and minimizes the training loss as follows:where is the loss function which evaluates the difference between the predicted value and the target and K denotes the number of iterations. In boosting mechanism, K additive individual learners are trained sequentially byandwhere . It is shown that the optimization problem depends much on the loss function and becomes unsolvable when is complex. Creatively, gradient boosting constructs the weak individuals based on the pseudo residuals, which are the gradient of loss function with respect to the model values predicted at the current learning step. For instance, let be the pseudo residual of the th sample at the th iteration written asand thus the th weak learner is trained by

As gradient boosting constructs additive ensemble model by sequentially fitting a weak individual learner to the current pseudo-residuals of whole training dataset at each iteration, it costs much training time and may suffer from overfitting problem. In view of that, a minor modification named stochastic gradient boosting is proposed to incorporate some randomization to the procedure. Specifically, at each iteration a randomly selected subset instead of the full training dataset is used to fit the individual learner and compute the model update for the current iteration. Namely, let be a random permutation of the integers , and the subset with size of the entire training dataset can be given by . Furthermore, the th weak learner using the stochastic gradient boosting ensemble scheme is trained by solving the following optimization problem asGiven the base learner which is trained by the initial training dataset, the final ensemble learning model constructed by stochastic gradient boosting scheme predicts an unknown testing instance as follows:

Stochastic gradient boosting is also considered as a special linear search optimization algorithm, which makes the newly added individual learner fit the fastest descent direction of partial training loss at each learning step.

#### 3. Stochastic Gradient Boosting-Based Extreme Learning Machine (SGB-ELM)

SGB-ELM is a novel hybrid learning algorithm, which introduces the stochastic gradient boosting method into ELM ensemble procedure. As boosting mechanism focuses on gradually reducing the training residuals at each iteration and ELM is a special multiparameters network (for classification tasks particularly), instead of combining the ELM and stochastic gradient boosting primitively, we design an enhanced training scheme to alleviate possible overfitting in our proposed SGB-ELM algorithm. The detailed implementation of SGB-ELM is presented in Algorithm 2, where the determination of optimal output weights for each individual ELM learner is illustrated in Algorithm 1 accordingly.