Programming Foundations for Scientific Big Data AnalyticsView this Special Issue
An Incremental Optimal Weight Learning Machine of Single-Layer Neural Networks
An optimal weight learning machine with growth of hidden nodes and incremental learning (OWLM-GHNIL) is given by adding random hidden nodes to single hidden layer feedforward networks (SLFNs) one by one or group by group. During the growth of the networks, input weights and output weights are updated incrementally, which can implement conventional optimal weight learning machine (OWLM) efficiently. The simulation results and statistical tests also demonstrate that the OWLM-GHNIL has better generalization performance than other incremental type algorithms.
Feedforward neural networks (FNNs) have been extensively used in classification applications and regressions . As a specific type of FNNs, single hidden layer feedforward networks with additive models can approximate any target continuous function . Owing to excellent learning capabilities and fault tolerant abilities, SLFNs play an important role in practical applications and have been investigated extensively in both theory and application aspects [3–7].
Maybe the most popular training method for SLFNs classifiers in recent years was gradient-based back-propagation (BP) algorithms [8–12]. BP algorithms can be easily recursively implemented in real time; however, the slow convergence is the bottleneck of BP, where the fast training is essential. In , a kind of novel learning machine named extreme learning machine (ELM) is proposed for training SLFNs, where the learning parameters of hidden nodes, including input weights and biases, are randomly assigned and need not be tuned, while output weights can be obtained by simple generalized inverse computation. It has been proven that, even without updating the parameters of the hidden layer, SLFNs with randomly generated hidden neurons and tunable output weights maintain their universal approximation and excellent generalization performance . It has been shown that ELM is faster than most state-of-the-art training algorithms for SLFNs and it has been applied widely to many practical cases such as classification, regression, clustering, recognition, and relevance ranking problems [15, 16].
Since input weights and hidden layer biases of SLFNs trained with ELM are randomly assigned, the minor changes of data in input vectors maybe cause large changes of data in hidden layer output matrix of the SLFNs. This in turn will lead to large changes of data in the output weight matrix. According to statistical learning theory [17–21], the large changes of data in the output weight matrix will greatly increase both structural and empirical risks of the SLFNs, which will in turn decrease robustness property of the SLFNs regarding the input disturbances. In fact, it has been noted from simulations that the SLFNs trained with the ELM sometimes perform poor generalization performance and robustness with regard to the input disturbances. In view of this situation, OWLM  was proposed; it is seen that both input weights and output weights of the SLFNs are globally optimized with the batch learning type of least squares. All feature vectors of classifier can then be placed at the prescribed positions in feature space in the sense that the separability of those nonlinearly separable patterns can be maximized, and better generalization performance can be achieved compared with conventional ELM.
However, there is still one major issue existing in OWLM, which is OWLM needing more computational cost than ELM, since the input weights are not randomly selected in SLFNs trained with OWLM. With the advent of the big data age, data sets become larger and more complex [23, 24], which reduces the learning efficiency of OWLM further. For implementing OWLM efficiently, this paper proposed an incremental learning machine referred to as optimal weight learning machine with growth of hidden nodes and incremental learning (OWLM-GHNIL). Whenever new nodes are added, the input weights and output weights could be incrementally updated which can implement the conventional OWLM algorithm efficiently. At the same time, owing to the advantages of OWLM, OWLM-GHNIL has better generalization performance than other incremental algorithms such as EM-ELM  (an approach that could automatically determine the number of hidden nodes in generalized single hidden layer feedforward networks) and I-ELM , which added random hidden nodes to SLFNs only one hidden node each time.
The rest of this paper is organized as follows: in Section 2, the OWLM is briefly described. In Section 3, we present OWLM-GHNIL in detail and analyze its computational complexity. Simulation results are then presented in Section 4, showing that our proposed approach performs more efficiently and has better generalization performance than some existing methods. In Section 5, we give conclusion.
2. Brief of the Optimal Weight Learning Machine
In this section, we briefly describe the OWLM.
For given input pattern vectors , as well as corresponding desired output data vectors respectively, N linear output equations of SLFNs in Figure 1 can be obtained aswherewithand input weight matrixand output weight matrix
Let be feature vectors, corresponding to the input data vectors Then we haveor
Let the reference feature vectors be described by
Generally, as described in , the selection of the desired feature vectors in (8) mainly depends on the characteristics of the input vectors. By optimizing the input weights of the SLFNs in Figure 1, the OWLM can place the feature vectors of the SLFNs at the “desired position” in feature space. The purpose of the assignment is to further maximize the separability of the vectors in the feature space so that the generalization performance and robustness, seen from the output layer of the SLFNs, can be greatly improved, compared with the SLFNs trained with ELM.
The design of the optimal input weight of the SLFNs can be formulated by the following optimization problem:where is a positive real regularization parameter.
The optimal input weight matrix was derived as follows:
Similarly, to minimize the error between the desired output pattern and the actual output pattern , the design of the optimal output weight of the SLFNs can be formulated by the following optimization problem:where is a positive real regularization parameter.
The optimal output weight matrix was derived as follows:
The optimal weight learning machine  can be summarized as follows.
Algorithm OWLM. Given a training set , as well as hidden node number , do the following steps.
Step 1. Randomly assign hidden node parameters (), .
Step 2. Calculate the hidden layer output matrix by (2).
Step 3. Calculate the input weight matrix by (10).
Step 4. Recalculate the hidden layer output matrix by .
Step 5. Calculate the output weight matrix by (12).
Obviously, the OWLM needs more training time compared with ELM, since it needs additional computational cost for computing the input weight matrix.
3. Growing Hidden Nodes and Incrementally Updating Weights
Given SLFNs with initial hidden nodes and a training set let be the number of input patterns, let be the length of the input patterns, and let be the length of the output patterns.
We havewhere is an matrix consisting of desired feature vectors.
Let be the network output error; if is less than the target error , then no new hidden nodes need to be added and the learning procedure completes. Otherwise, we could add new nodes to the existing SLFNs; thenwhere is an matrix consisting of desired feature vectors. Then,
The Schur complement of is invertible by choosing the suitable . Then, using the result on the inversion of block matrices , we can getthenTo save computational cost, in (19) should be computed as the following sequence:
Given the number of initial hidden nodes , the maximum number of hidden nodes , and the expected output error , the OWLM-GHNIL for the SLFNs with the mechanism of growing hidden nodes can be summarized as the following two steps.
Different from conventional OWLM which needs recalculating the input weight matrix and output weight matrix, whenever the network architecture is changed, the OWLM-GHNIL only needs updating the input weight matrix and output weight matrix incrementally each time; that is why it can reduce the computational complexity significantly. Moreover, the convergence of the OWLM-GHNIL can be guaranteed by the Convergence Theorem in .
Now, we begin to analyze computational complexity of the updated work.
The computational complexity, which we consider, expresses the total number of required scalar multiplications. Some matrix computations need not be done repeatedly including the inversion of matrix in (13), since they have been obtained in the process of computing . Then it requires , and multiplications for , , and , respectively. Thus, the total computational complexity for the weights and is
Since in most applications and can be much smaller than the number of training samples N: and h and l are often small number in practical applications, then, with the growth of , n,when ,when ,
It can be seen that the OWLM-GHNIL is much more efficient than the conventional OWLM in such cases.
Similarly, we can get the computational complexity of the EM-ELM and ELM, respectively.
Then, we have
Obviously, the difference on computational complexity between OWLM-GHNIL and EM-ELM is much less than the difference between OWLM and ELM.
4. Simulation Results
In our experiments, all the algorithms are run in such computer environment: (1) operating system: Windows 7 Enterprise; (2) 3.8 GHZ CPU, Intel i5-3570; (3) memory: 8 GB; (4) simulating software: Matlab R2013a.
The performance of the OWLM-GHNIL has been compared with other growing algorithms including the EM-ELM, I-ELM, and the conventional OWLM.
In order to investigate the performance of the proposed OWLM-GHNIL, some benchmark problems are presented in this section.
The OWLM-GHNIL, EM-ELM, and OWLM have first been run to approximate the artificial “SinC” function which is a popular choice to illustrate neural network.
A training set and testing set with 5000 samples, respectively, are generated from the interval with random noise distributed in , while testing data remain noise-free. The performances of each algorithm are shown in Figures 2 and 3. In this case, initial SLFNs are given five hidden nodes and then one new hidden node will be added each step until 30 hidden nodes arrive.
It can be seen from Figure 2 that the OWLM and the OWLM-GHNIL obtain similar lower testing root mean square error (RMSE) than the EM-ELM in most cases. Figure 3 shows the training time comparison of the three methods in SinC case. We can see that, with the growth of hidden nodes, the OWLM-GHNIL spent similar training time with the EM-ELM but much less than the OWLM in the case of the same number of nodes.
In the following, nine real benchmark problems including five regression applications and four classification applications are used for further comparison; all of them are available on the Web. For each case, the training data set and testing data set are randomly generated from its whole data set before each trial of simulation, and average results are obtained over 30 trials for all cases. The features of the benchmark data sets are summarized in Table 1.
The generalization performance comparison between the OWLM-GHNIL and two other popular incremental ELM-type algorithms, EM-ELM and I-ELM, on regression and classification cases is given in Tables 2 and 3. In the implementation of the EM-ELM and OWLM-GHNIL, initial SLFNs are given 50 hidden nodes and then 25 new hidden nodes will be added each step until 150 hidden nodes arrive. In the case of the I-ELM, the initial SLFNs are given 1 hidden node and then the hidden nodes are added one by one until 150 hidden nodes. As observed from test results of average RMSE and accuracy in Tables 2 and 3, it looks that the OWLM-GHNIL obtained better generalization performance than the EM-ELM and I-ELM. In order to obtain an objective statistical measure, we apply a Student’s -test to each data to check if the differences between the OWLM-GHNIL and the other two algorithms are statistically significant ( value = 0.05, i.e., confidence of 95%). It was shown in Table 2 that, in four of the regression data sets (Delta Ailerons, Delta Elevators, California Housing, and Bank domains) and three of the classification data sets (COLL20, USPST(B), and Satimage), the -test gave a significant difference between OWLM-GHNIL and EM-ELM with superior generalization performance of the OWLM-GHNIL, whereas no significant difference was found in the two other data sets (computer activity and G50C). In Table 3, the -test results show that there was a significant difference between OWLM-GHNIL and I-ELM with superior generalization performance of the OWLM-GHNIL in all data sets except computer activity data set.
In this paper, we have developed an efficient method, OWLM-GHNIL; it can grow hidden nodes one by one or group by group in SLFNs. The analysis of computational complexity and simulation results on an artificial problem shows that OWLM-GHNIL can significantly reduce the computational complexity of OWLM. The simulation results on nine real benchmark problems including five regression applications and four classification applications also show that OWLM-GHNIL has better generalization performance than the two other incremental algorithms EM-ELM and I-ELM. -test gave a significant difference with superior generalization performance of the OWLM-GHNIL further.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant no. LY18F030003, Foundation of High-Level Talents in Lishui City under Grant no. 2017RC01, Scientific Research Foundation of Zhejiang Provincial Education Department under Grant no. Y201432787, and the National Natural Science Foundation of China under Grant no. 61373057.
C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, NY, USA, 1995.View at: MathSciNet
S. Haykin, Neural networks and learning machines, Pearson, Prentice-Hall, New Jersey, USA, 3rd edition, 2009.
S. Kumar, Neural Networks, McGraw-Hill Companies Inc., Columbus, OH, USA, 2006.
J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley Publishing Company, Boston, Mass, USA, 1991.View at: MathSciNet
M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, UK, 1999.View at: MathSciNet
V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, Wiley- Interscience, New York, NY, USA, 1998.View at: MathSciNet
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, New York, NY, USA, 2nd edition, 2001.View at: MathSciNet
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, Mass, USA, 1984.View at: MathSciNet
G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.View at: MathSciNet