Research Article  Open Access
An Incremental Optimal Weight Learning Machine of SingleLayer Neural Networks
Abstract
An optimal weight learning machine with growth of hidden nodes and incremental learning (OWLMGHNIL) is given by adding random hidden nodes to single hidden layer feedforward networks (SLFNs) one by one or group by group. During the growth of the networks, input weights and output weights are updated incrementally, which can implement conventional optimal weight learning machine (OWLM) efficiently. The simulation results and statistical tests also demonstrate that the OWLMGHNIL has better generalization performance than other incremental type algorithms.
1. Introduction
Feedforward neural networks (FNNs) have been extensively used in classification applications and regressions [1]. As a specific type of FNNs, single hidden layer feedforward networks with additive models can approximate any target continuous function [2]. Owing to excellent learning capabilities and fault tolerant abilities, SLFNs play an important role in practical applications and have been investigated extensively in both theory and application aspects [3–7].
Maybe the most popular training method for SLFNs classifiers in recent years was gradientbased backpropagation (BP) algorithms [8–12]. BP algorithms can be easily recursively implemented in real time; however, the slow convergence is the bottleneck of BP, where the fast training is essential. In [13], a kind of novel learning machine named extreme learning machine (ELM) is proposed for training SLFNs, where the learning parameters of hidden nodes, including input weights and biases, are randomly assigned and need not be tuned, while output weights can be obtained by simple generalized inverse computation. It has been proven that, even without updating the parameters of the hidden layer, SLFNs with randomly generated hidden neurons and tunable output weights maintain their universal approximation and excellent generalization performance [14]. It has been shown that ELM is faster than most stateoftheart training algorithms for SLFNs and it has been applied widely to many practical cases such as classification, regression, clustering, recognition, and relevance ranking problems [15, 16].
Since input weights and hidden layer biases of SLFNs trained with ELM are randomly assigned, the minor changes of data in input vectors maybe cause large changes of data in hidden layer output matrix of the SLFNs. This in turn will lead to large changes of data in the output weight matrix. According to statistical learning theory [17–21], the large changes of data in the output weight matrix will greatly increase both structural and empirical risks of the SLFNs, which will in turn decrease robustness property of the SLFNs regarding the input disturbances. In fact, it has been noted from simulations that the SLFNs trained with the ELM sometimes perform poor generalization performance and robustness with regard to the input disturbances. In view of this situation, OWLM [22] was proposed; it is seen that both input weights and output weights of the SLFNs are globally optimized with the batch learning type of least squares. All feature vectors of classifier can then be placed at the prescribed positions in feature space in the sense that the separability of those nonlinearly separable patterns can be maximized, and better generalization performance can be achieved compared with conventional ELM.
However, there is still one major issue existing in OWLM, which is OWLM needing more computational cost than ELM, since the input weights are not randomly selected in SLFNs trained with OWLM. With the advent of the big data age, data sets become larger and more complex [23, 24], which reduces the learning efficiency of OWLM further. For implementing OWLM efficiently, this paper proposed an incremental learning machine referred to as optimal weight learning machine with growth of hidden nodes and incremental learning (OWLMGHNIL). Whenever new nodes are added, the input weights and output weights could be incrementally updated which can implement the conventional OWLM algorithm efficiently. At the same time, owing to the advantages of OWLM, OWLMGHNIL has better generalization performance than other incremental algorithms such as EMELM [25] (an approach that could automatically determine the number of hidden nodes in generalized single hidden layer feedforward networks) and IELM [14], which added random hidden nodes to SLFNs only one hidden node each time.
The rest of this paper is organized as follows: in Section 2, the OWLM is briefly described. In Section 3, we present OWLMGHNIL in detail and analyze its computational complexity. Simulation results are then presented in Section 4, showing that our proposed approach performs more efficiently and has better generalization performance than some existing methods. In Section 5, we give conclusion.
2. Brief of the Optimal Weight Learning Machine
In this section, we briefly describe the OWLM.
For given input pattern vectors , as well as corresponding desired output data vectors respectively, N linear output equations of SLFNs in Figure 1 can be obtained aswherewithand input weight matrixand output weight matrix
Let be feature vectors, corresponding to the input data vectors Then we haveor
Let the reference feature vectors be described by
Generally, as described in [22], the selection of the desired feature vectors in (8) mainly depends on the characteristics of the input vectors. By optimizing the input weights of the SLFNs in Figure 1, the OWLM can place the feature vectors of the SLFNs at the “desired position” in feature space. The purpose of the assignment is to further maximize the separability of the vectors in the feature space so that the generalization performance and robustness, seen from the output layer of the SLFNs, can be greatly improved, compared with the SLFNs trained with ELM.
The design of the optimal input weight of the SLFNs can be formulated by the following optimization problem:where is a positive real regularization parameter.
The optimal input weight matrix was derived as follows:
Similarly, to minimize the error between the desired output pattern and the actual output pattern , the design of the optimal output weight of the SLFNs can be formulated by the following optimization problem:where is a positive real regularization parameter.
The optimal output weight matrix was derived as follows:
The optimal weight learning machine [22] can be summarized as follows.
Algorithm OWLM. Given a training set , as well as hidden node number , do the following steps.
Step 1. Randomly assign hidden node parameters (), .
Step 2. Calculate the hidden layer output matrix by (2).
Step 3. Calculate the input weight matrix by (10).
Step 4. Recalculate the hidden layer output matrix by .
Step 5. Calculate the output weight matrix by (12).
Obviously, the OWLM needs more training time compared with ELM, since it needs additional computational cost for computing the input weight matrix.
3. Growing Hidden Nodes and Incrementally Updating Weights
Given SLFNs with initial hidden nodes and a training set let be the number of input patterns, let be the length of the input patterns, and let be the length of the output patterns.
We havewhere is an matrix consisting of desired feature vectors.
Let be the network output error; if is less than the target error , then no new hidden nodes need to be added and the learning procedure completes. Otherwise, we could add new nodes to the existing SLFNs; thenwhere is an matrix consisting of desired feature vectors. Then,
The Schur complement of is invertible by choosing the suitable . Then, using the result on the inversion of block matrices [26], we can getthenTo save computational cost, in (19) should be computed as the following sequence:
Given the number of initial hidden nodes , the maximum number of hidden nodes , and the expected output error , the OWLMGHNIL for the SLFNs with the mechanism of growing hidden nodes can be summarized as the following two steps.
Algorithm OWLMGHNIL
Step 1 (initialization step). (1)Compute by (13)–(15).(2)Compute the corresponding output error .
Step 2 (recursively incremental step). Let , and while and ,(1);(2)randomly add ( need not be kept constant) hidden nodes to the existing network; then can be calculated by (16)–(19).End
Different from conventional OWLM which needs recalculating the input weight matrix and output weight matrix, whenever the network architecture is changed, the OWLMGHNIL only needs updating the input weight matrix and output weight matrix incrementally each time; that is why it can reduce the computational complexity significantly. Moreover, the convergence of the OWLMGHNIL can be guaranteed by the Convergence Theorem in [25].
Now, we begin to analyze computational complexity of the updated work.
The computational complexity, which we consider, expresses the total number of required scalar multiplications. Some matrix computations need not be done repeatedly including the inversion of matrix in (13), since they have been obtained in the process of computing . Then it requires , and multiplications for , , and , respectively. Thus, the total computational complexity for the weights and is
If we compute and by (10) and (12) directly, it will cost multiplications.
Since in most applications and can be much smaller than the number of training samples N: and h and l are often small number in practical applications, then, with the growth of , n,when ,when ,
It can be seen that the OWLMGHNIL is much more efficient than the conventional OWLM in such cases.
Similarly, we can get the computational complexity of the EMELM and ELM, respectively.
Then, we have
Obviously, the difference on computational complexity between OWLMGHNIL and EMELM is much less than the difference between OWLM and ELM.
4. Simulation Results
In our experiments, all the algorithms are run in such computer environment: (1) operating system: Windows 7 Enterprise; (2) 3.8 GHZ CPU, Intel i53570; (3) memory: 8 GB; (4) simulating software: Matlab R2013a.
The performance of the OWLMGHNIL has been compared with other growing algorithms including the EMELM, IELM, and the conventional OWLM.
In order to investigate the performance of the proposed OWLMGHNIL, some benchmark problems are presented in this section.
The OWLMGHNIL, EMELM, and OWLM have first been run to approximate the artificial “SinC” function which is a popular choice to illustrate neural network.
A training set and testing set with 5000 samples, respectively, are generated from the interval with random noise distributed in , while testing data remain noisefree. The performances of each algorithm are shown in Figures 2 and 3. In this case, initial SLFNs are given five hidden nodes and then one new hidden node will be added each step until 30 hidden nodes arrive.
It can be seen from Figure 2 that the OWLM and the OWLMGHNIL obtain similar lower testing root mean square error (RMSE) than the EMELM in most cases. Figure 3 shows the training time comparison of the three methods in SinC case. We can see that, with the growth of hidden nodes, the OWLMGHNIL spent similar training time with the EMELM but much less than the OWLM in the case of the same number of nodes.
In the following, nine real benchmark problems including five regression applications and four classification applications are used for further comparison; all of them are available on the Web. For each case, the training data set and testing data set are randomly generated from its whole data set before each trial of simulation, and average results are obtained over 30 trials for all cases. The features of the benchmark data sets are summarized in Table 1.

The generalization performance comparison between the OWLMGHNIL and two other popular incremental ELMtype algorithms, EMELM and IELM, on regression and classification cases is given in Tables 2 and 3. In the implementation of the EMELM and OWLMGHNIL, initial SLFNs are given 50 hidden nodes and then 25 new hidden nodes will be added each step until 150 hidden nodes arrive. In the case of the IELM, the initial SLFNs are given 1 hidden node and then the hidden nodes are added one by one until 150 hidden nodes. As observed from test results of average RMSE and accuracy in Tables 2 and 3, it looks that the OWLMGHNIL obtained better generalization performance than the EMELM and IELM. In order to obtain an objective statistical measure, we apply a Student’s test to each data to check if the differences between the OWLMGHNIL and the other two algorithms are statistically significant ( value = 0.05, i.e., confidence of 95%). It was shown in Table 2 that, in four of the regression data sets (Delta Ailerons, Delta Elevators, California Housing, and Bank domains) and three of the classification data sets (COLL20, USPST(B), and Satimage), the test gave a significant difference between OWLMGHNIL and EMELM with superior generalization performance of the OWLMGHNIL, whereas no significant difference was found in the two other data sets (computer activity and G50C). In Table 3, the test results show that there was a significant difference between OWLMGHNIL and IELM with superior generalization performance of the OWLMGHNIL in all data sets except computer activity data set.


5. Conclusion
In this paper, we have developed an efficient method, OWLMGHNIL; it can grow hidden nodes one by one or group by group in SLFNs. The analysis of computational complexity and simulation results on an artificial problem shows that OWLMGHNIL can significantly reduce the computational complexity of OWLM. The simulation results on nine real benchmark problems including five regression applications and four classification applications also show that OWLMGHNIL has better generalization performance than the two other incremental algorithms EMELM and IELM. test gave a significant difference with superior generalization performance of the OWLMGHNIL further.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant no. LY18F030003, Foundation of HighLevel Talents in Lishui City under Grant no. 2017RC01, Scientific Research Foundation of Zhejiang Provincial Education Department under Grant no. Y201432787, and the National Natural Science Foundation of China under Grant no. 61373057.
References
 C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, NY, USA, 1995. View at: MathSciNet
 G.B. Huang and H. A. Babri, “Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,” IEEE Transactions on Neural Networks and Learning Systems, vol. 9, no. 1, pp. 224–229, 1998. View at: Publisher Site  Google Scholar
 X.F. Hu, Z. Zhao, S. Wang, F.L. Wang, D.K. He, and S.K. Wu, “Multistage extreme learning machine for fault diagnosis on hydraulic tube tester,” Neural Computing and Applications, vol. 17, no. 4, pp. 399–403, 2008. View at: Publisher Site  Google Scholar
 T.Y. Kwok and D.Y. Yeung, “Objective functions for training new hidden units in constructive neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 8, no. 5, pp. 1131–1148, 1997. View at: Publisher Site  Google Scholar
 E. J. Teoh, K. C. Tan, and C. Xiang, “Estimating the number of hidden neurons in a feedforward network using the singular value decomposition,” IEEE Transactions on Neural Networks and Learning Systems, vol. 17, no. 6, pp. 1623–1629, 2006. View at: Publisher Site  Google Scholar
 X. Luo, J. Deng, W. Wang, J.H. Wang, and W. Zhao, “A quantized kernel learning algorithm using a minimum kernel risksensitive loss criterion and bilateral gradient technique,” Entropy, vol. 19, no. 7, article no. 365, 2017. View at: Publisher Site  Google Scholar
 Y. Xu, X. Luo, W. Wang, and W. Zhao, “Efficient DVHOP localization forwireless cyberphysical social sensing system: A correntropybased neural network learning scheme,” Sensors, vol. 17, no. 1, article no. 135, 2017. View at: Publisher Site  Google Scholar
 S. Haykin, Neural networks and learning machines, Pearson, PrenticeHall, New Jersey, USA, 3rd edition, 2009.
 S. Kumar, Neural Networks, McGrawHill Companies Inc., Columbus, OH, USA, 2006.
 X. Yao, “Evolving artificial neural networks,” Proceedings of the IEEE, vol. 87, no. 9, pp. 1423–1447, 1999. View at: Publisher Site  Google Scholar
 G. P. Zhang, “Neural networks for classification: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 30, no. 4, pp. 451–462, 2000. View at: Publisher Site  Google Scholar
 J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation, AddisonWesley Publishing Company, Boston, Mass, USA, 1991. View at: MathSciNet
 G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006. View at: Publisher Site  Google Scholar
 G. Huang, L. Chen, and C. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Transactions on Neural Networks and Learning Systems, vol. 17, no. 4, pp. 879–892, 2006. View at: Publisher Site  Google Scholar
 S.F. Ding, X.Z. Xu, and R. Nie, “Extreme learning machine and its applications,” Neural Computing and Applications, vol. 25, no. 3, pp. 549–556, 2014. View at: Publisher Site  Google Scholar
 X. Luo, Y. Xu, W. Wang et al., “Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy,” Journal of The Franklin Institute, 2017. View at: Publisher Site  Google Scholar
 M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, UK, 1999. View at: MathSciNet
 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, Wiley Interscience, New York, NY, USA, 1998. View at: MathSciNet
 L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, vol. 31 of Stochastic Modelling and Applied Probability, SpringerVerlag New York, Berlin, Germany, 1996. View at: Publisher Site  MathSciNet
 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, WileyInterscience, New York, NY, USA, 2nd edition, 2001. View at: MathSciNet
 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, Mass, USA, 1984. View at: MathSciNet
 Z. Man, K. Lee, D. Wang, Z. Cao, and S. Khoo, “An optimal weight learning machine for handwritten digit image recognition,” Signal Processing, vol. 93, no. 6, pp. 1624–1638, 2013. View at: Publisher Site  Google Scholar
 X. Luo, J. Deng, J. Liu, W. Wang, X. Ban, and J. Wang, “A quantized kernel least mean square scheme with entropyguided learning for intelligent data analysis,” China Communications, vol. 14, no. 7, pp. 127–136, 2017. View at: Publisher Site  Google Scholar
 W. Zhao, R. Lun, C. Gordon et al., “A humancentered activity tracking system: toward a healthier workplace,” IEEE Transactions on HumanMachine Systems, vol. 47, no. 3, pp. 343–355, 2017. View at: Publisher Site  Google Scholar
 G. Feng, G.B. Huang, Q. Lin, and R. Gay, “Error minimized extreme learning machine with growth of hidden nodes and incremental learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 20, no. 8, pp. 1352–1357, 2009. View at: Publisher Site  Google Scholar
 G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996. View at: MathSciNet
Copyright
Copyright © 2018 HaiFeng Ke et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.