Abstract

A novel particle swarm optimization based selective ensemble (PSOSEN) of online sequential extreme learning machine (OS-ELM) is proposed. It is based on the original OS-ELM with an adaptive selective ensemble framework. Two novel insights are proposed in this paper. First, a novel selective ensemble algorithm referred to as particle swarm optimization selective ensemble is proposed, noting that PSOSEN is a general selective ensemble method which is applicable to any learning algorithms, including batch learning and online learning. Second, an adaptive selective ensemble framework for online learning is designed to balance the accuracy and speed of the algorithm. Experiments for both regression and classification problems with UCI data sets are carried out. Comparisons between OS-ELM, simple ensemble OS-ELM (EOS-ELM), genetic algorithm based selective ensemble (GASEN) of OS-ELM, and the proposed particle swarm optimization based selective ensemble of OS-ELM empirically show that the proposed algorithm achieves good generalization performance and fast learning speed.

1. Introduction

Feedforward neural network is one of the most prevailing neural networks for data processing in the past decades [1, 2]. However, the slow learning speed limits its applications. Recently, an original algorithm designed for single hidden layer feedforward neural networks (SLFNs) named extreme learning machine (ELM) was proposed by Huang et al. [3]. ELM is a tuning free algorithm for it randomly selects the input weights and biases of the hidden nodes instead of learning these parameters. And, also, the output weights of the network are then analytically determined. ELM proves to be a few orders faster than traditional learning algorithms and obtains better generalization performance as well. It lets the fast and accurate data analytics become possible and has been applied to many fields [46].

However, the algorithms mentioned above need all the training data available to build the model, which is referred to as batch learning. In many industrial applications, it is very common that the training data can only be obtained one by one or chunk by chunk. If batch learning algorithms are performed each time new training data is available, the learning process will be very time consuming. Hence online learning is necessary for many real world applications.

An online sequential extreme learning machine is then proposed by Liang et al. [7]. OS-ELM can learn the sequential training observations online at arbitrary length (one by one or chunk by chunk). New arrived training observations are learned to update the model of the SLFNs. As soon as the learning procedure for the arrived observations is completed, the data is discarded. Moreover, it has no prior knowledge about the amount of the observations which will be presented. Therefore, OS-ELM is an elegant online learning algorithm which can handle both the RBF and additive nodes in the same framework and can be used to both the classification and function regression problems. OS-ELM proves to be a very fast and accurate online sequential learning algorithm [810], which can provide better generalization performance in faster speed compared with other online learning algorithms such as GAP-RBF, GGAP-RBF, SGBP, RAN, RANEKF, and MRAN.

However, due to the random generation of the parameters for the hidden nodes, the generalization performance of OS-ELM sometimes cannot be guaranteed, similar to ELM. Some ensemble based methods have been applied to ELM to improve its accuracy [1113]. Ensemble learning is a learning scheme where a collection of a finite number of learners are trained for the same task [14, 15]. It has been demonstrated that the generalization ability of a learner can be significantly improved by ensembling a set of learners. In [16] a simple ensemble OS-ELM, that is, EOS-ELM, has been investigated. However, Zhou et al. [17] proved that selective ensemble is a better choice. We apply this idea to OS-ELM. At first, a novel selective ensemble algorithm, termed as PSOSEN, is proposed. PSOSEN adopts particle swarm optimization [18] to select the individual OS-ELMs to form the ensemble. Benefiting from the fast speed of PSO, PSOSEN is designed to be a new accurate and fast selective ensemble algorithm. It should be noted that PSOSEN is a general selective ensemble algorithm suitable for any learning algorithms.

Different from batch learning, online learning algorithms need to perform learning continually. Therefore the complexity of the learning algorithm should be taken into account. Obviously, performing selective ensemble learning each step is not a good choice for online learning. Thus we designed an adaptive selective ensemble framework for OS-ELM. A set of OS-ELMs are trained online, and the root mean square error (RMSE) will always be calculated. The error will be compared with a preset threshold . If RMSE is bigger than the threshold, it means the model is not accurate. Then PSOSEN will be performed and a selective ensemble is obtained. Otherwise, it means the model is relatively accurate and the ensemble will not be selected. Then the output of the system is calculated as the average of the individuals in the ensemble set. And each individual OS-ELM will be updated recursively.

UCI data sets [19], which contain both regression and classification data, are used to verify the feasibility of the proposed algorithm. Comparisons of three aspects including RMSE, standard deviation and running time between OS-ELM, and EOS-ELM, selective ensemble of OS-ELM (SEOS-ELM) with both GASEN and PSOSEN are presented. The results convincingly show that PSOSEN achieves better generalization accuracy and fast learning speed.

The rest of the paper is organized as follows. In Section 2, previous work including ELM and OS-ELM is reviewed. The novel selective ensemble based on particle swarm optimization is presented in Section 3. An adaptive selective ensemble framework is designed for OS-ELM in Section 4. Experiments are carried out in Section 5 and the comparison results are also presented. In Section 6, further discussion about PSOSEN is provided. We draw the conclusion of the paper in Section 7.

In this section, both the basic ELM algorithm and the online version OS-ELM are reviewed in brief as the background knowledge for our work.

2.1. Extreme Learning Machine (ELM)

ELM algorithm is derived from single hidden layer feedforward neural networks (SLFNs). Unlike traditional SLFNs, ELM assigns the parameters of the hidden nodes randomly without any iterative tuning. Besides, all the parameters of the hidden nodes in ELM are independent of each other. Hence ELM can be seen as generalized SLFNs.

Given training samples , where is an input vector of dimensions and is a target vector of dimensions. Then SLFNs with hidden nodes each with output function are mathematically modeled aswhere are parameters of hidden nodes, and is the weight vector connecting the th hidden node and the output node. To simplify, (1) can be written equivalently as where is called the hidden layer output matrix of the neural network, and the column of is the output of the th hidden node with respect to inputs .

In ELM, can be easily obtained as long as the training set is available and the parameters are randomly assigned. Then ELM evolves into a linear system and the output weights are calculated aswhere is the Moore-Penrose generalized inverse of matrix .

The ELM algorithm can be summarized in three steps as shown in Algorithm 1.

Input:
A training set , hidden node output function
, and the number of hidden nodes .
Steps:
(1) Assign parameters of hidden nodes randomly, .
(2) Calculate the hidden layer output matrix .
(3) Calculate the output weight = , where is the Moore-Penrose generalized
inverse of hidden layer output matrix .

2.2. OS-ELM

In many industrial applications, it is impossible to have all the training data available before the learning process. It is common that the training observations are sequentially inputted to the learning algorithm; that is, the observations arrive one-by-one or chunk-by-chunk. In this case, the batch ELM algorithm is no longer applicable. Hence, a fast and accurate online sequential extreme learning machine was proposed to deal with online learning.

The output weight obtained from (4) is actually a least-squares solution of (2). Given , the number of hidden nodes, can be presented asThis can also be called the left pseudoinverse of for it satisfies the equation . If tends to be singular, smaller network size and larger data number should be chosen in the initialization step of OS-ELM. Substituting (5) to (4), we can getwhich is the least-squares solution to (2). Then the OS-ELM algorithm can be deduced by recursive implementation of the least-squares solution of (6).

There are two main steps in OS-ELM, initialization step and update step. In the initialization step, the number of training data needed in this step should be equal to or larger than network size . In the update step, the learning model is updated with the method of recursive least square (RLS). And only the newly arrived single or chunk training observations are learned, which will be discarded as soon as the learning step is completed.

The two steps for OS-ELM algorithm in general are as follows.(a)Initialization step: batch ELM is used to initialize the learning system with a small chunk of initial training data from given training set (1)Assign random input weights and bias (for additive hidden nodes) or center and impact factor (for RBF hidden nodes), .(2)Calculate the initial hidden layer output matrix: (3)Calculate the initial output weight , where and .(4)Set . Initialization is finished.(b)Sequential learning step is as follows.The th chunk of new observations can be expressed aswhere represents the number of observations in the th chunk newly arrived.(1)Compute the partial hidden layer output matrix for the th chunk:(2)Set . And we haveTo avoid calculating inverse in the iterative procedure, is factored as the following according to Woodbury formula:Let .(3)Calculate the output weight , according to the updating equations: (4)Set . Go to step (b).

3. Particle Swarm Optimization Selective Ensemble

In this section, a novel selective ensemble method referred to as particle swarm optimization selective ensemble (PSOSEN) is proposed. PSOSEN adopts particle swarm optimization to select the good learners and combine their predictions. Detailed procedures of the PSOSEN algorithm will be introduced in this section.

A remarkable superiority of PSOSEN is its speed over other selective ensemble algorithms. Another popular selective ensemble learning method is based on genetic algorithm. Compared with GASEN, PSOSEN achieves faster convergence to optimal solution due to the omission of crossover and mutation operations used in GASEN. GASEN is actually quite complicated for the requirement of encode, decode, and other genetic operations. For instance, GASEN only works with binary encoding, while PSOSEN is available for any forms of values based on their current positions and velocity vectors in the corresponding hyperspace. For PSOSEN, there is no need for overmuch parameter adjustment, thus easy to implement. Although using simple method, PSOSEN is still capable of obtaining high accuracy of prediction and reaching the optima earlier than GASEN. Furthermore, PSO is less influenced by changes in problem dimensionality or modality of problems compared with GA, which also proves to be robust in most situations [20].

As selective ensemble is usually more time-consuming than original algorithm, a faster optimization method might be preferable. For this purpose, PSOSEN might be more appropriate to be adopted to search for the optimal ensemble of ELM models efficiently.

Zhou et al. [17] have demonstrated that ensembling many of the available learners may be better than ensembling all of those learners in both regression and classification. The detailed proof of this conclusion will not be presented in this paper. However, one important problem for selective ensemble is how to select the good learners in a set of available learners.

The novel approach selective ensemble algorithm is proposed to select good learners in the ensemble. PSOSEN is based on the idea of heuristics. It assumes each learner can be assigned a weight, which could characterize the fitness of including this learner in the ensemble. Then the learner with the weight bigger than a preset threshold could be selected to join the ensemble.

We will explain the principle of PSOSEN from the context of regression. We use to denote the weight of the th component learner. The weight should satisfy the following equations:Then the weight vector is

Suppose input variables according to the distribution , the true output of is , and the actual output of the learner is . Then the output of the simple weighted ensemble on is

Then the generalization error of the th learner and the generalization error of the ensemble are calculated on , respectively:

The generalization error of the th learner and that of the ensemble is calculated on , respectively:

We then define the correlation between the th and the th component learner as follows:

Obviously satisfies the following equations:

Considering the equations defined above, we can get

To minimize the generalization error of the ensemble, according to (22), the optimum weight vector can be obtained as

The th variable of , that is, , can be solved by Lagrange multiplier:

The equation can be simplified to

Taking (2) into account, we can get

Equation (26) gives the direct solution for . But the solution seldom works well in real world applications. Due to the fact that some learners are quite similar in performance, when a number of learners are available, the correlation matrix may be irreversible or ill-conditioned.

Although we cannot obtain the optimum weights of the learner directly, we can approximate them in some way. Equation (23) can be viewed as an optimization problem. As particle swarm optimization has been proved to be a powerful optimization tool, PSOSEN is then proposed. The basic PSO algorithm is shown in Figure 1.

PSOSEN randomly assigns a weight to each of the available learners at first. Then it employs particle swarm optimization algorithm to evolve those weights so that the weights can characterize the fitness of the learners in joining the ensemble. Finally, learners whose weight is bigger than a preset threshold are selected to form the ensemble. Note that if all the evolved weights are bigger than the threshold , then all the learners will be selected to join the ensemble.

PSOSEN can be applied to both regression and classification problems for the purpose of the weights evolving process which is only to select the component learners. In particular, the outputs of the ensemble for regression are combined via simple averaging instead of weighted averaging. The reason is that previous work [17] showed that using the weights both in selection of the component learners and in combination of the outputs tends to suffer the overfitting problem.

In the process of generating population, the goodness of the individuals is evaluated via validation data bootstrap sampled from the training data set. We use to denote the generalization error of the ensemble, which corresponds to individual on the validation data . Obviously can describe the goodness of . The smaller is, the better is. So, PSOSEN adopts as the fitness function.

The PSOSEN algorithm is summarized as follows.    are bootstrap samples generated from original training data set. A component learner is trained from each . And a selective ensemble is built from . The output is the average output of the ensemble for regression or the class label who receives the most number in voting process for classification (see Algorithm 2).

Input: training set , learner , trial , threshold
Steps:
(1) for = 1 to
   = bootstrap sample from
   =
}
(2) generate a population of weight vectors
(3) evolve the population by PSO, where the fitness of the weight vector is defined as
.
(4) = the evolved best weight vector
Output: ensemble :
for regression
for classification

4. Particle Swarm Optimization Based Selective Ensemble of Online Sequential Extreme Learning Machine

In this section, PSOSEN is applied to the original OS-ELM to improve the generalization performance. In order to reduce the complexity and employ PSOSEN flexibly, an adaptive framework is then designed. The flowchart of the framework is shown as in Figure 2.

Online learning is necessary in many industrial applications. In these situations, training data can only be obtained sequentially. Although OS-ELM is proposed as useful online learning algorithm, the generalization performance may not be quite good results from the random generation of the input parameters. Ensemble methods have been investigated in OS-ELM, that is, the EOS-ELM algorithm [16]. However, it is only very simple ensemble method, which just calculates the average of all the individual OS-ELMs. In this section, selective ensemble, which is superior to simple ensemble, is applied to OS-ELM. The novel selective ensemble method proposed in Section 3 is adopted. Apparently, performing PSOSEN each step is a time consuming process. We design an adaptive framework to determine whether to perform PSOSEN or simple ensemble. Thus the accuracy and the complexity can be balanced well. The framework for the new algorithm can be explained as follows.

First, individual OS-ELMs are initialized. The number of nodes is same for each OS-ELM, while the input weights and biases for each OS-ELM are randomly generated.

Second, the RMSE error is calculated:where is the expected output, while is the actual output of the individual OS-ELM.

The RMSE will be compared with a preset threshold . If is bigger than , which means simple ensemble is not accurate, PSOSEN is performed and a selective ensemble is obtained. And if is smaller than , which indicates that simple ensemble is relatively accurate, the ensemble will not be selected.

Third, the output of the system is calculated as the average output of the individual in the ensemble set:where is the output matrix of the th OS-ELM, and is the output weight calculated by the th OS-ELM at step .

At last, each OS-ELM will update recursively according to the update equations presented in Section 2.

5. Performance Evaluation of PSOSEN Based OS-ELM

In this section, a series of experiments were conducted to evaluate the performance of the proposed algorithm. OS-ELM, EOS-ELM, and GASEN based OS-ELM are also compared with the new algorithm in this section. All the experiments were carried out in the MATLAB R2012b environment on a desktop of CPU 3.40 GHz and 8 GB RAM.

5.1. Model Selection

For OS-ELM, the number of hidden nodes is the only parameter that needs to be determined. Cross-validation method is usually used to choose this parameter. Fifty trials of simulations are performed, respectively, for regression and classification problems. The number of hidden nodes is then determined by the validation error.

For EOS-ELM, SEOS-ELM (GASEN), and SEOS-ELM (PSOSEN), there is another parameter that needs to be determined, that is, the number of networks in the ensemble. The parameter is set from 5 to 30 with the interval 5. Finally, the optimal parameter is selected according to the RMSE for regression, testing accuracy for classification, and standard deviation value. Under the same problem, the number of OS-ELMs is selected based on the lowest standard deviation and the comparable RMSE or accuracy compared with OS-ELM. Table 1 is an example of selecting the optimal number of networks for SEOS-ELM (PSOSEN) with RBF hidden nodes on New-thyroid dataset. As illustrated by Table 1, the lowest standard deviation occurs when the number of OS-ELMs is 20. Meanwhile, the prediction accuracy of SEOS-ELM is better than OS-ELM. Hence we set the number of networks to be 20 for the New-thyroid dataset. The numbers of OS-ELMs for other datasets are determined in the same way.

Both the Gaussian radial basis function (RBF) and the sigmoid additive are adopted as activation function in OS-ELM, EOS-ELM, SEOS-ELM (GASEN), and SEOS-ELM (PSOSEN).

In the experiments, OS-ELM, EOS-ELM, and SEOS-ELM (GASEN) were compared with SEOS-ELM (PSOSEN). Some general information of the benchmark datasets used in our evaluations is listed in Table 2. Both regression and classification problems are included.

For OS-ELM, the input weights and biases with additive activation function or the centers with RBF activation function were all generated from the range . For regression problems, all the inputs and outputs were normalized into the range , while the inputs and outputs were normalized into the range for classification problems.

The benchmark datasets studied in the experiments are from UCI Machine Learning Repository except California Housing dataset from the StatLib Repository. Besides, a time-series problem, Mackey-Glass, from UCI was also adopted to test our algorithms.

5.2. Algorithm Evaluation

To verify the superiority of the proposed algorithm, RMSE for regression problems and testing accuracy for classification problems are, respectively, computed. The initial size of the dataset is very small, which equals to the number of the hidden nodes to guarantee the model to work. All the data then is sent to the model in a one-by-one learning mode. The evaluation results are presented in Tables 3, 4, 5, and 6, which are, respectively, corresponding to the models with sigmoid hidden nodes and RBF hidden nodes for both regression and classification problems. Each result is an average of 50 trials. And in every trial of one problem, the training and testing samples were randomly adopted from the dataset that was addressed currently.

From the comparison results of four tables, we can easily find that EOS-ELM, SEOS-ELM (GASEN), and SEOS-ELM (PSOSEN) are more time consuming than OS-ELM, but they still keep relatively fast speed at most of the time. It should be noted that the complexity of SEOS-ELM is adjustable, which depends on the threshold .

What is important, EOS-ELM, SEOS-ELM (GASEN), and SEOS-ELM (PSOSEN) all attain lower testing deviation and more accurate regression or classification results than OS-ELM, which shows the advantage of ensemble learning. In addition, both SEOS-ELM (GASEN) and SEOS-ELM (PSOSEN) are more accurate than EOS-ELM. This verifies that selective ensemble is better than simple ensemble method.

In terms of the comparison between SEOS-ELM (GASEN) and SEOS-ELM (PSOSEN), it can be observed that both of the two selective ensemble algorithms achieve comparable accuracy. However, the advantage of the new algorithm is that it is more computational efficient. This verifies that PSOSEN is a fast and accurate selective ensemble algorithm.

As an online learning algorithm, the online learning ability is another important evaluation criterion. To illustrate the online learning ability of the proposed algorithm, a simulated regression dataset is adopted. The dataset was generated from the function , comprising 4500 training data and 1000 testing data. Noting that this function is chosen arbitrarily just to simulate the regression problem, Figures 3 and 4 explicitly depict the variability of training accuracy of SEOS-ELM (PSOSEN), EOS-ELM, and OS-ELM with respect to the number of training data in the process of learning. It can be observed that with the increasing number of training samples, RMSE values of the three methods significantly decline. As the online learning progressed, the training models are continuously updated and corrected. We can then conclude that the more training data the system learns, the more precise the model is. Whether sigmoid or RBF the hidden nodes is, SEOS-ELM always obtains smaller RMSE than EOS-ELM and OS-ELM, which indicates that the performance of SEOS-ELM is considerably accurate compared with the other methods. Moreover, the smaller testing deviation of SEOS-ELM in Table 3 to Table 6 also confirms the stability performance of SEOS-ELM.

6. Discussion

In the experiments, PSOSEN showed its higher accuracy than the original OS-ELM and simple ensemble of OS-ELM, which verified the feasibility of the selective ensemble method. In addition, compared with GASEN, PSOSEN showed comparable accuracy while much faster learning speed. Taking the complexity and accuracy into consideration, PSOSEN is a good choice for selective ensemble. Experiments on online version ELM have demonstrated the advantages. However, it should be noted that, as a general selective ensemble method, PSOSEN is applicable to any learning algorithms, both batch learning and online learning. So, applying PSOSEN to other learning algorithms are of interest in the future.

The experiments also showed that although ensemble learning, both simple ensemble and selective ensemble, attains higher accuracy, it is more time consuming than the original learning algorithm. In addition, selective ensemble is slower than simple ensemble. As a selective ensemble method, PSOSEN is also slower than the original learning algorithm and the simple ensemble. So, selective ensemble is a trade-off between complexity and accuracy. In the future, new selective ensemble method should be designed to further improve the speed of the algorithm.

7. Conclusion

In this paper, PSOSEN is proposed as a novel selective ensemble algorithm. Benefiting from the fast speed of PSO, PSOSEN proves to be faster than other selective ensemble algorithms. It is a general selective ensemble algorithm, which is applicable to any learning algorithms. To improve the generalization performance of the online learning algorithm, we then apply PSOSEN to OS-ELM. And in purpose of balancing the complexity and accuracy, an adaptive selective ensemble framework for OS-ELM is designed. Experiments were carried out on UCI data set. The results convincingly show that the new algorithm improves the generalization performance of OS-ELM and also keeps balance on complexity.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is partially supported by Natural Science Foundation of China (41176076, 31202036, 51379198, and 51075377).