Research Article  Open Access
Deep Network Based on Stacked Orthogonal Convex Incremental ELM Autoencoders
Abstract
Extreme learning machine (ELM) as an emerging technology has recently attracted many researchers’ interest due to its fast learning speed and stateoftheart generalization ability in the implementation. Meanwhile, the incremental extreme learning machine (IELM) based on incremental learning algorithm was proposed which outperforms many popular learning algorithms. However, the incremental algorithms with ELM do not recalculate the output weights of all the existing nodes when a new node is added and cannot obtain the leastsquares solution of output weight vectors. In this paper, we propose orthogonal convex incremental learning machine (OCIELM) with GramSchmidt orthogonalization method and Barron’s convex optimization learning method to solve the nonconvex optimization problem and leastsquares solution problem, and then we give the rigorous proofs in theory. Moreover, in this paper, we propose a deep architecture based on stacked OCIELM autoencoders according to stacked generalization philosophy for solving large and complex data problems. The experimental results verified with both UCI datasets and large datasets demonstrate that the deep network based on stacked OCIELM autoencoders (DOCIELMAEs) outperforms the other methods mentioned in the paper with better performance on regression and classification problems.
1. Introduction
Extreme learning machine (ELM) proposed by Huang et al. [1, 2] is a specific type of singlehidden layer feedforward network (SLFN) with randomly generated additive or RBF hidden nodes and hidden node parameters, which has recently been extensively studied by many researchers in various areas of scientific research and engineering due to the excellent approximation capability. Wang et al. presented ASLGEMELM algorithm, which provides some useful guidelines for improving the generalization ability of SLFNs trained with ELM [3]. Alongside probing deeply into the research of theory and its application, ELM has become one of the leading trends for fast learning [4–7]. Recently, Huang et al. [8] have proposed an algorithm called incremental extreme learning machine (IELM) which randomly adds nodes to the hidden layer one by one and freezes the output weights of the existing hidden nodes when a new hidden node is added [9–12]. Then, Huang et al. [13] also showed its universal approximation capability for the case of fully complex hidden nodes. IELM is fully automatically implemented and in theory no intervention is required for the learning process from users. But there still exist some issues to be tackled [14]:(1)The redundant nodes can be generated in IELM, which have a minor effect on the outputs of the network. Moreover, the existence of redundant nodes can eventually increase the complexity of the network.(2)The convergence rate of IELM is slower than ELM, and the number of hidden nodes in IELM is sometimes larger than the dimension of samples for the training.
In this paper, we propose a method called orthogonal convex extreme learning machine (OCIELM) to further settle the aforementioned problems of IELM. With the rigorous proofs in theory, we can obtain the leastsquares solution of and faster convergence rate by adopting the GramSchmidt orthogonalization method incorporated into CIELM [15]. The simulations on realworld datasets show that the proposed OCIELM algorithm can achieve faster convergence rates, more compact neural network, and better generalization performance than both IELM and the improved IELM algorithms while keeping the simplicity and efficiency of ELM.
Recently, deep learning has attracted many research interests with its remarkable success in many applications [16–18]. Deep learning is an artificial neural network learning algorithm which has multilayer perceptrons. Deep learning has achieved an approximation of complex functions and alleviated the optimization difficulty associated with the deep models [19–21]. Motivated by the remarkable success of deep learning [22, 23], we propose a new stacked architecture to solve large and complex data problems using OCIELM autoencoder as the training algorithm in each layer, which incorporates the excellent performance of OCIELM with the ability of complex function approximation derived from deep architectures. We implemented OCIELM autoencoder in each iteration of deep orthogonal convex incremental extreme machine (DOCIELM) to reconstruct the input data and estimate the errors of the prediction functions with the scheme of layerbylayer architectures. Both the supervised and the unsupervised data all can be the pertaining input of the new proposed deep network. Moreover, the OCIELM autoencoderbased deep network (DOCIELMAEs) can suffice to achieve the efficiency improvement for generalization performance.
To show the effectiveness of DOCIELMAEs, we apply it to both the ordinary realworld datasets with UCI datasets and large datasets with MNIST, OCR Letters, NORB, and USPS datasets. The simulations show that the proposed deep model possesses better accuracy of testing and more compact network architecture than the aforementioned improved IELM and other deep models without incurring the outofmemory problem.
This paper is organized as follows. Section 2 reviews the preliminary knowledge of incremental extreme learning machine (IELM). Section 3 describes OCIELM algorithm, the proposed model which adopts the GramSchmidt orthogonalization method into convex IELM (CIELM). Section 4 makes a comparison between OCIELM and other algorithms. Section 5 presents the details of DOCIELMAEs algorithm and compares the performance with deep architecture models. Section 6 applies the DOCIELMAEs algorithm into elongation prediction of strips. Finally, Section 7 concludes this paper.
2. Related Works
In this section, the main concepts and theory of the IELM [8] algorithm are shortly reviewed. For the sake of generality, we assume that the network has only one linear output node, and all the analysis can be easily extended into multinonlinear output nodes cases. Consider a training dataset ; the SLFN with additive hidden nodes and activation function can be represented bywhere is the weight vector connecting the input layer to the th hidden node, is the weight connecting the th hidden node to the output node, is the threshold of the th hidden node, and is the hidden node activation function.
The IELM proposed by Huang et al. is different from the conventional ELM algorithm; IELM is an automatic algorithm which can randomly add hidden nodes to the network one by one and freeze all the weights of the existing hidden nodes when a new hidden node is added, until the expected learning accuracy is obtained or the maximum number of hidden nodes is reached. Thus, IELM algorithm can be summarized in Algorithm 1.
Algorithm 1 (incremental extreme learning machine (IELM)). Given a training dataset , activation function , number of hidden nodes , expected learning accuracy , and maximum number of hidden nodes , one has the following.
Step 1 (initialization). Let and residual error , where .
Step 2 (learning step). While , ,(a)increase the number of hidden nodes by one;(b)assign random input weight and bias for hidden nodes ;(c)calculate the residual error after adding the new hidden node;(d)calculate the output weight for the new hidden nodes: ;(e)calculate the residual error: ;Endwhile.
3. The Proposed Orthogonal Convex Incremental Extreme Learning Machine (OCIELM)
The motivation for the work in this section comes from the important properties of basic ELM as follows:(1)The special solution is one of the leastsquares solutions of a general linear system , meaning that the smallest training error can be reached by this special solution: .(2)The smallest norm of weights: the special solution has the smallest norm among all the leastsquares solutions of :(3)The minimum norm leastsquares solution of is unique, which is .
In this section, we propose an improved IELM algorithm (OCIELM) based on GramSchmidt orthogonalization method combined with Barron’s convex optimization learning method and prove the OCIELM algorithm in theory which can obtain the leastsquares solution of . Meanwhile, OCIELM can achieve a more compact network architecture, faster convergence rate, and better generalization performance than other improved IELM algorithms while retaining the IELM’s simplicity and efficiency.
Theorem 2. GramSchmidt orthogonalization process converts linearly independent vectors into orthogonal vectors [24]. Given a linearly independent vector set in the inner product space , the vector set for GramSchmidt orthogonalization process is as follows [25]:where is the set of standardized vectors and form an orthogonal set with the same linear span. For each index , .
Theorem 3. Given an orthogonal vector set in the inner product space , if vector can be expressed as a linear representation of , one has
Proof. Given the vector set and vector , suppose there exist scalars ; then the linear combination of those vectors with those scalars as coefficients isSubstituting (4) into (5), we have :
CIELM was originally proposed by Huang and Chen [15], which incorporates Barron’s convex optimization learning method into IELM. By recalculating the output weights of the existing hidden nodes randomly generated after a new node is added, the CIELM can obtain better performance than IELM. Incorporated with GramSchmidt orthogonalization and Barron’s convex optimization learning method, the process of OCIELM algorithm can be described in Algorithm 4.
Algorithm 4 (orthogonal convex incremental extreme learning machine (OCIELM)). Given a training dataset , where and , and given activation function , maximum number of iterations , and expected learning accuracy , one has the following.
Step 1 (initialization). Let the number of initial hidden nodes , the number of iterations , and residual error , where .
Step 2. This step consists of two steps as follows.
Orthogonalization Step. In this step, the following is carried out:Increase the number of hidden nodes and by one, respectively: and .Randomly assign hidden node parameters for new hidden node and calculate the output , and the hidden layer output matrix ,Learning Step. While , ,calculate the output weight for the newly added hidden node:recalculate the output weight vectors of all existing hidden nodes if :calculate the residual error after adding the new hidden node :Endwhile.
The rigorous proof on the conclusion is detailedly discussed where OCIELM can obtain the leastsquares solution of .
Theorem 5. Given a training dataset and number of hidden nodes , where and , the hidden layer output matrix is , and the matrix of the output weights from the hidden nodes to the output nodes is . Let denote the residual error function, and , holds with probability one if and for all .
Proof. The proof consists of two steps:(a)Firstly, we prove .(b)And then, we further prove .(a) According to the condition given above, we have the following:(1)Here,(2)When the output weight , we have(3)When the output weight , we also have(4)When the output weight , suppose that, for all , we have(5)When the output weight , suppose that, for all , we haveSo, ; that is, . Therefore,(b) According to (17), we have , where is arbitrary; then, we haveAnd holds only if . Therefore, .
4. Experiments and Analysis
In this section, we tested the generalization performance of the proposed OCIELM with other similar learning algorithms on ten UCI realworld datasets, including five regression and five classification problems, as shown in Table 1. The simulations are conducted in MATLAB 2013a environment running on Windows 7 machine with 32 GB of memory and i7990X (3.46 GHz) processor.

The experimental results between OCIELM and some other ELM algorithms on regression and classification problems are given in Tables 2 and 3. In Tables 2 and 3, the best results obtained by the OCIELM and the other 4 algorithms are italicized and shown in boldface. In Section 4.1, we compare the generalization performance of OCIELM with another six stateoftheart algorithms on regression problems. In Section 4.2, we compare the generalization performance of OCIELM with the same six algorithms on classification problems. All these results in this section are obtained from thirty trials for all cases, and the mean results (mean), rootmeansquare errors (RMSE), and standard deviations (Std.) are listed in the corresponding tables, respectively. The seven representative evolutionary algorithms are listed as follows:(i)Convex incremental extreme learning machine (CIELM) [15].(ii)Parallel chaos search based incremental extreme learning machine (PCELM) [26].(iii)Leaveoneout incremental extreme learning machine (LOOIELM) [27].(iv)Sparse Bayesian extreme learning machine (SBELM) [28].(v)Improved incremental regularized extreme learning machine (IIRELM) [11].(vi)Enhancement incremental regularized extreme learning machine (EIRELM) [12].


4.1. Performance Comparison of Regression Problems
In this section, datasets Auto MPG, California Housing, Servo, CCS (Concrete Compressive Strength), and Parkinsons are conducted for the regression problems. Table 2 shows the RMSE of the training and testing with fixed hidden nodes obtained from OCIELM and another six algorithms, respectively. Meanwhile, hidden nodes and learning time with the same stop RMSE are also shown in Table 2. For California Housing dataset in the table, the OCIELM provides lower training and testing RMSE rate (0.1272 and 0.1263) than CIELM (0.1601 and 0.1583), PCELM (0.1389 and 0.1377), LOOIELM (0.1376 and 0.1374), SBELM (0.1363 and 0.1369), IIRELM (0.1341 and 0.1339), and EIRELM (0.1274 and 0.1268) with fixed nodes (). For the stop criterion of RMSE 0.12, OCIELM also exhibits more compact network architecture with 127.15 nodes and faster speed with 0.9704 s, and the nodes and training speed of other algorithms are, respectively, 330.09 and 1.0051; 199.34 and 0.9810; 217.08 and 0.9766; and 192.33 and 0.9713. Where SBELM is the fixed ELM, thus there is a difficulty in finding the accurate stop criterion. Likewise, in the CCS dataset, the hidden node of SBELM is an approximate value; meanwhile, OCIELM shows better generalization performance than other algorithms in comparisons. Although, in the cases of Auto MPG, Servo, and Parkinsons, the learning time consumed by OCIELM shows that the presented algorithm is not the top spot, the average convergence rate of five regression problems consumed by OCIELM is still the fastest. Moreover, the average convergence rate demonstrates that the stability performance of OCIELM is better than other algorithms. The proposed algorithm can retain the simplicity and efficiency of incremental ELM and obtain the leastsquares solution of by incorporating the GramSchmidt orthogonalization method. The optimal solution obtained from means that the best hidden node parameter leading to the largest residual error decreasing will be added to the existing network. Therefore, OCIELM can efficiently reduce the network complexity and meanwhile enhance the generalization performance of the algorithm.
4.2. Performance Comparison of Classification Problems
In this section, datasets Delta Ailerons, Waveform II, Abalone, Breast Cancer, and Energy Efficiency are conducted for the classification problems. Table 3 shows the comparisons of the classification performance conducted on 5 UCI datasets of classification problems. With the same fixed hidden nodes listed in Table 3, the results of comparisons obtained from OCIELM are better than those of the other algorithms. For Waveform II dataset in the table, the OCIELM proposed displays better training accuracy and standard deviation of 93.11 and 0.0083 than CIELM (84.47% and 0.0182), PCELM (89.81% and 0.0104), LOOIELM (88.93% and 0.0097), SBELM (80.69% and 0.0181), IIRELM (90.64% and 0.0112), and EIRELM (91.15% and 0.0096), thanks to the better classification ability of OCIELM. In addition, the hidden nodes 29.54 and the average time 3.0864 s are also less than others, which can demonstrate that OCIELM has more reasonable network structure than CIELM, PCELM, LOOIELM, SBELM, IIRELM, and EIRELM, which efficiently reduces the complexity of the network. Although SBELM shows the advantages in training speed as the fixed ELM, OCIELM generally produces better performance in comprehensive consideration of accuracy and speed for practical problems which need higher accuracy demand.
In short, OCIELM can generally achieve better performance on these regression and classification problems in terms of training (and testing) RMSE for regression and testing accuracy for classification. Moreover, the compactness of network and convergence rate also display the good performance of OCIELM algorithm.
5. Deep Network Based on Stacked OCIELM Autoencoders (DOCIELMAEs)
5.1. OCIELM Autoencoder
As an artificial neural network model, the autoencoder is frequently applied in the deep architecture approaches. Autoencoder is a kind of unsupervised neural network, where the input of network is equal to the output. Kasun et al. [29] proposed an autoencoder based on ELM (ELMAE). According to their ELMAE theory, the model of ELMAE is composed of input layer, hidden layer, and output layer. In addition, the weights and biases of the hidden nodes are randomly generated via orthogonalization, and the input data is projected to a different or equal dimension space [30]; the expressions are as follows:where are the weights generated orthogonally randomly and are the biases generated orthogonally randomly between the input and hidden nodes. There are three calculation approaches to obtain the output weight of ELMAE: (1)For sparse ELMAE representations, output weights can be calculated as follows:(2)For compressed ELMAE representations, output weights can be calculated as follows:(3)For equal dimension ELMAE representations, output weights can be calculated as follows:
In this section, we present the OCIELM, which is incorporated with Barron’s convex optimization learning method and GramSchmidt orthogonalization method into IELM to achieve the optimal leastsquares solution as the training algorithm for an autoencoder instead of conventional autoencoders, which apply backpropagation algorithm (BP) for training to obtain the identity function and normal ELM for training the autoencoder. Because of the adoption of incremental algorithm, there is no need to set the number of hidden nodes according to the experience. With the initialization of the maximum value of hidden nodes the, number of hidden nodes can be increased by more than one node each time, until the stop criterions are met; for example, residual error is equal to the expected learning accuracy or the number of hidden nodes achieves .
As shown in Figure 1, the model structure of OCIELMAE can randomly control the number of the nodes without the computation accuracy. Given a training dataset , where and , and given activation function and maximum number of hidden nodes in single layer , the input data is reconstructed at the output layer through the following function:The output weight can be obtained with the following:where is the input weight generated randomly and are the input and output of the OCIELMAE.
5.2. Implementation of Stacked OCIELM Autoencoders in Deep Network
In 2006, Hinton et al. [31] presented the concept of deep learning to solve the problems of unsupervised data. Deep belief nets (DBNs) are probabilistic generative models which are first trained only with unlabeled data and then finetuned in a supervised mode. And then, another kind of deep network based on Restricted Boltzmann Machine (RBM) [32], deep Boltzmann machine (DBM) [33], was introduced by Salakhutdinov and Larochelle. The base building block of DBN is RBM. The MLELM was presented by Kasun et al. in 2013 [29]. There is no difference between the MLELM and the other deep learning models; MLELM performs layerwise unsupervised learning to train the parameters with the hidden layer weights which are initialized with ELMAE, and the MLELM does not need to be finetuned. The AESELMs was proposed by Zhou et al. in 2014 [34]. The network consists of multiple ELMs with a small number of hidden nodes in each layer to substitute a single ELM with a large number of hidden nodes, and it implements ELM autoencoder in each iteration of SELMs algorithm to further improve the testing accuracy, especially for the unstructured large data without properly selected features.
Algorithm 6 (deep network based on stacked orthogonal convex incremental ELM autoencoders (DOCIELMAEs)). Given a training dataset , where and , and given activation function , maximum number of hidden nodes in single layer , maximum number of iterations , and expected learning accuracy , one has the following.
Step 1 (initialization). Let the number of initial hidden nodes , the number of iterations , and residual error , where .
Step 2 (orthogonal convex IELM autoencoder on layer 1). This step consists of two steps as follows.
Orthogonalization Step. In this step, the following is carried out:Increase the number of hidden nodes and by one, respectively: and .Randomly assign hidden node parameters for new hidden node and calculate the output , and the hidden layer output matrix,Learning Step. While , ,calculate the output weight for the newly added hidden node: recalculate the output weight vectors of all existing hidden nodes if : calculate the residual error after adding the new hidden node :Step 3 (orthogonal convex IELM autoencoder on layer ). This step is carried out as follows.
Learning Step. While , ,(a)calculate the output weight for the newly added hidden node with the hidden layer output matrix :(b)recalculate the output weight vectors of all existing hidden nodes if :(c)calculate the residual error after adding the new hidden node :Endwhile.
The DOCIELMAEs algorithm inherits the advantages of incremental constructive feedforward networks model and deep learning algorithms on exactly capturing higherlevel abstractions and characterizing the data representations. The implementation of autoencoders for the unsupervised pretraining of data exhibits superduper performance on regression and classification problems. The improved method utilizes the OCIELMAE as a base building layer to construct the whole deep architecture. As shown in Figure 2, the data is mapped to OCIELM feature space; in each layer, OCIELMAE output weights with respect to input data are the weight of the first layer; for the same reason, the output weights of OCIELMAE, with respect to hidden layer output, are the layer weights of DOCIELMAEs. The detailed algorithm of DOCIELMAEs is shown in Algorithm 6.
5.2.1. Performance Comparison of Regression Problems Based on DOCIELMAEs
In this section, we mainly test the regression performance of the proposed OCIELM and DOCIELMAEs on three UCI realworld datasets, Parkinsons, California Housing, and CCS (Concrete Compressive Strength) data, and two large datasets, BlogFeedback and Online News Popularity data. The simulations are conducted in MATLAB 2013a environment running on Windows 7 machine with 128 GB of memory and Intel Xeon E52620V2 (2.1 GHz) processor.
The regression performance comparisons of the proposed algorithms OCIELM and DOCIELMAEs with the baseline methods including SVM [35], single ELM, MLELM, AESELMs, DBN, ErrCor [36], and PCELM are shown in Table 5. The specific analyses on the results of regression capability and effectiveness are as follows:(1)OCIELM compared with SVM, ELM, ErrCor, and PCELM: we perform the regression testing on the datasets described in Table 4. The simulations are obtained by the average of 50 trails; we can observe from Table 4 that the testing accuracies of OCIELM on UCI datasets and large datasets are both better than SVM, ELM, ErrCor, and PCELM. For BlogFeedback dataset, the training accuracy of OCIELM is 91.76, and those of SVM, ELM, ErrCor, and PCELM are 89.75, 90.12, 90.39, and 90.54, respectively. Meanwhile, OCIELM also obtains better testing accuracy of 91.82 than other algorithms. Although the OCIELM is an iterative learning algorithm, the compact neural network makes the convergence rate faster than PCELM and ErrCor, merely slower than SVM and ELM. Thus, the training time consumed on OCIELM learning is acceptable.(2)DOCIELMAEs compared with DBN, MLELM, and AESELMs: the testing accuracy on UCI datasets can show that the performance of DOCIELMAEs outperforms the OCIELM. With the aforementioned comparisons between OCIELM and the other algorithms (SVM, ELM, ErrCor, and PCELM), evidenced by the same token, the DOCIELMAEs can achieve better testing accuracy than SVM, ELM, ErrCor, and PCELM; this result can also be seen in Table 5. For the largescale datasets (BlogFeedback and Online News Popularity), the DOCIELMAEs obtained accuracies of 93.16, 93.27 and 93.69, 93.84 for training and testing with the network structures 28110001000200010 and 617007001000026, respectively. The simulations in Table 5 show that DOCIELMAEs can produce better results than DBN, MLELM, and AESELMs. Furthermore, DOCIELMAEs enjoys the advantage over the DBN and MLELM on training speed. Thus, with the better regression performance, DOCIELMAEs would provide the stateoftheart method for largescale unstructured data problems.


5.2.2. Performance Comparison of Classification Problems Based on DOCIELMAEs
The classification performance comparisons of the proposed algorithms OCIELM and DOCIELMAEs with the baseline methods including SVM, single ELM, MLELM, AESELMs, DBN, ErrCor, and PCELM are shown in Table 6. The specific comparisons are as follows:(1)OCIELM compared with SVM, ELM, ErrCor, and PCELM: the simulation results are obtained by the average of 50 trails on datasets in Table 4 (from Delta Ailerons to NORB data). For the BlogFeedback, the training and testing accuracies are 91.76 and 91.82, respectively, listed in Table 6; we can see that OCIELM achieves better classification accuracy than SVM, ELM, ErrCor, and PCELM. And the speed of learning is faster than other improved ELM algorithms, notwithstanding behind the SVM and single ELM due to the process of iteration learning.(2)DOCIELMAEs compared with DBN, MLELM, and AESELMs: to test these anticipated effects, we used UCI datasets and largescale datasets to acquire the results. From the experimental results, we can see that the classification accuracies of DOCIELMAEs are better than others obviously. Focusing on NORB, the network structure used by DOCIELMAEs is 204880080030005; the DOCIELMAEs obtained the best accuracies of 93.16, 93.27 and 93.69, 93.84 for training and testing, respectively, in all algorithms, including SVM, single ELM, MLELM, AESELMs, DBN, ErrCor, PCELM OCIELM, and DOCIELMAEs. Furthermore, the simulation results of other datasets also display the outstanding performance of DOCIELMAEs. Thus, with the better accuracy and faster speed of training, DOCIELMAEs can be applied in the vast majority of classification problems.

6. Case Study on Elongation Prediction of Strips
In this section, all of the experimental results for the elongation of strips prediction are presented. The annealing treatment is considered the most important process to cold rolled strips. In this process, the cold working hardening and internal stress of strips can be eliminated; the hardness of strips can be reduced; moreover, the ability of plastic deformation, stamping, and mechanical technique can be improved. Figure 3 shows the process of continuous annealing. In the furnace, the strips will pass five temperature sections, that is, preheating section (PHS), heating section (HS), slow cooling section (SS), rapid cooling section (RCS), and equalising section (ES), and three tension sections, that is, SS tension section, RCS tension section, and HS tension section. Therefore, the strips will extend or shorten with the changes of temperature and tension. Meanwhile, the surface friction coefficient and the rotational speed of the tension rolls also affect the elongation of strips, rendering the weld position unable to be tracked accurately, having a great influence on the rate of finished product and the safety of airknife. Thus, the proposed method DOCIELMAEs is applied in the annealing of strips process to obtain the position information of welds. The annealing process has 12 continuous process measurements and 10 manipulated variables according to the experience and mechanism analysis.
We collect the historical records in the last 16 months which can affect the position of the welding seam, including the temperature data of 5 sections, the tension data of 3 sections, and the speed data of 11 sections. We use data of 10 months for training and the following data of 6 months for testing. The comparison results of elongation of strips prediction are shown in Figure 4. From the figures, we can see that the prediction results in the 6 months obtained based on 6 algorithms can all approximate the measured values. Although there are only little differences in the experimental results, the prediction based on DOCIELMAEs consistently outperforms the other methods in the comparisons.
For further investigation on the prediction capabilities of DOCIELMAEs, the performances of algorithms are evaluated in terms of four criteria, that is, the mean absolute percentage error (), the mean square error (), the relative rootmeansquare error (), and the absolute fraction of variance (). During testing of DOCIELMAEs and other algorithms, they are also defined by using the following equations:where and are measured value and predicted value, respectively, and is the number of testing data. The smaller , , and and the larger are indicative of better generalization performance of algorithm.
in Figure 5(a) evaluates the effect on measured values, generated by the disparity between the elongation values of steelstrips and the predicted values. Meanwhile, and in Figures 5(b) and 5(c), respectively, also reflect the dispersion of models, where are more sensitive to the large errors compared with because the squared errors amplify the large errors further. in Figure 5(d) is the distance between errors and predicted values. closer to 1 means that the algorithms have better performances. By analyzing the comparisons, it has become apparent that the results for evaluation criteria based on arbitrary 6month testing data show better generalization performances of DOCIELMAEs than other algorithms in experiments of comparison. Accordingly, there is important practical significance in the prediction of elongation of steelstrips using DOCIELMAEs.
(a)
(b)
(c)
(d)
In order to demonstrate the effectiveness of the algorithm proposed in practical engineering, we have selected the successive data of 12 months from the whole data (16month data) to conduct the comparisons, and we obtained the prediction accuracies of every month and one year. The comparisons of prediction accuracy shown in Figure 6 indicate that the performance of the algorithm proposed is the best overall, with mean accuracy of 96.795 in 12 months (all the year), compared with those of 92.49, 92.71, 94.47, 94.62, 94.43, 93.18, 93.22, and 94.50 obtained from SVM, ELM, MLELM, AESELMs, DBN, ErrCor, PCELM, and OCIELM methods, respectively. DOCIELMAEs has the best accuracy among the nine methods, which indicates the predictive stability and performance of the method. Therefore, we can get the conclusion that DOCIELMAEs has the best prediction performance in the testing, and the algorithm proposed is a very effective method.
7. Conclusions
In this paper, we proposed a stacked architecture with OCIELM algorithm based on deep representation learning and added the OCIELM autoencoder into each layer of OCIELM, called DOCIELMAEs. The experiment results have demonstrated strongly that DOCIELMAEs can be suitable for solving regression and classification problems; simulations showed that, (1) compared with CIELM, EIELM, ECIELM, PCELM, and OCIELM, DOCIELMAEs can achieve the best testing accuracy with the same network size, even less hidden nodes; meanwhile, the speed of learning is also faster than other algorithms. Moreover, DOCIELMAEs has better performance than OCIELM algorithm; (2) compared with SVM, ELM, MLELM, AESELMs, DBN, ErrCor, PCELM, and OCIELM, DOCIELMAEs can also obtain the best testing accuracy with consuming more time in a certain range for the large datasets; (3) compared with SVM, ELM, MLELM, AESELMs, DBN, ErrCor, PCELM, and OCIELM, the DOCIELMAEs applied in the case of stripselongation prediction can enhance the performance of prediction; demonstrated with the production data, the prediction accuracy based on the algorithm we proposed outperforms other algorithms. For these reasons, the OCIELM and DOCIELMAEs can further be implemented in practical engineering and have the potential for solving more complicated big data problems with further study.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61102124) and Liaoning Key Industry Programme (JH2/101).
References
 G.B. Huang, Q.Y. Zhu, and C.K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006. View at: Publisher Site  Google Scholar
 G.B. Huang, D. H. Wang, and Y. Lan, “Extreme learning machines: a survey,” International Journal of Machine Learning and Cybernetics, vol. 2, no. 2, pp. 107–122, 2011. View at: Publisher Site  Google Scholar
 X.Z. Wang, Q.Y. Shao, Q. Miao, and J.H. Zhai, “Architecture selection for networks trained with extreme learning machine using localized generalization error model,” Neurocomputing, vol. 102, pp. 3–9, 2013. View at: Publisher Site  Google Scholar
 A. M. Fu, C. R. Dong, and L. S. Wang, “An experimental study on stability and generalization of extreme learning machines,” International Journal of Machine Learning and Cybernetics, vol. 6, no. 1, pp. 129–135, 2015. View at: Publisher Site  Google Scholar
 X.Z. Wang, R. A. R. Ashfaq, and A.M. Fu, “Fuzziness based sample categorization for classifier performance improvement,” Journal of Intelligent and Fuzzy Systems, vol. 29, no. 3, pp. 1185–1196, 2015. View at: Publisher Site  Google Scholar
 J. Wu, S. T. Wang, and F.L. Chung, “Positive and negative fuzzy rule system, extreme learning machine and image classification,” International Journal of Machine Learning and Cybernetics, vol. 2, no. 4, pp. 261–271, 2011. View at: Publisher Site  Google Scholar
 S. Lu, X. Wang, G. Zhang, and X. Zhou, “Effective algorithms of the MoorePenrose inverse matrices for extreme learning machine,” Intelligent Data Analysis, vol. 19, no. 4, pp. 743–760, 2015. View at: Publisher Site  Google Scholar
 G.B. Huang, L. Chen, and C.K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879–892, 2006. View at: Publisher Site  Google Scholar
 J. Zhang, S. Ding, N. Zhang, and Z. Shi, “Incremental extreme learning machine based on deep feature embedded,” International Journal of Machine Learning and Cybernetics, vol. 7, no. 1, pp. 111–120, 2016. View at: Publisher Site  Google Scholar
 Y. Ye and Y. Qin, “QR factorization based Incremental Extreme Learning Machine with growth of hidden nodes,” Pattern Recognition Letters, vol. 65, pp. 177–183, 2015. View at: Publisher Site  Google Scholar
 J.L. Ding, F. Wang, H. Sun, and L. Shang, “Improved incremental regularized extreme learning machine algorithm and its application in twomotor decoupling control,” Neurocomputing, vol. 149, pp. 215–223, 2015. View at: Publisher Site  Google Scholar
 Z. Xu, M. Yao, Z. Wu, and W. Dai, “Incremental regularized extreme learning machine and it's enhancement,” Neurocomputing, vol. 174, pp. 134–142, 2016. View at: Publisher Site  Google Scholar
 G.B. Huang, M.B. Li, L. Chen, and C.K. Siew, “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. 71, no. 4–6, pp. 576–583, 2008. View at: Publisher Site  Google Scholar
 Y. Li, “Orthogonal incremental extreme learning machine for regression and multiclass classification,” Neural Computing & Applications, vol. 27, no. 1, pp. 111–120, 2016. View at: Publisher Site  Google Scholar
 G.B. Huang and L. Chen, “Convex incremental extreme learning machine,” Neurocomputing, vol. 70, no. 16–18, pp. 3056–3062, 2007. View at: Publisher Site  Google Scholar
 G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. View at: Publisher Site  Google Scholar
 S. Ding, N. Zhang, X. Xu, L. Guo, and J. Zhang, “Deep extreme learning machine and its application in EEG classification,” Mathematical Problems in Engineering, vol. 2015, Article ID 129021, 11 pages, 2015. View at: Publisher Site  Google Scholar
 O. Vinyals, Y. Jia, L. Deng, and T. Darrell, “Learning with recursive perceptual representations,” in Advances in Neural Information Processing Systems, pp. 2825–2833, 2012. View at: Google Scholar
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 2, no. 25, MIT Press, 2012. View at: Google Scholar
 R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, “Semisupervised recursive autoencoders for predicting sentiment distributions,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11), pp. 151–161, Association for Computational Linguistics, July 2011. View at: Google Scholar
 Y. Bengio and O. Delalleau, “On the expressive power of deep architectures,” in Algorithmic Learning Theory, J. Kivinen, C. Szepesvári, E. Ukkonen, and T. Zeugmann, Eds., vol. 6925 of Lecture Notes in Computer Science, pp. 18–36, Springer, New York, NY, USA, 2011. View at: Publisher Site  Google Scholar
 Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–27, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 Y. Bengio and Y. Lecun, “Scaling learning algorithms towards AI,” LargeScale Kernel Machines, vol. 2007, no. 34, pp. 1–41, 2007. View at: Google Scholar
 T. S. Shores, Applied Linear Algebra and Matrix Analysis, Springer, Berlin, Germany, 2007.
 G. Taguchi and R. Jugulum, The Mahalanobis Taguchi Strategy: A Pattern Technology System, John Wiley & Sons, Hoboken, NJ, USA, 2002. View at: Publisher Site
 Y. M. Yang, Y. N. Wang, and X. F. Yuan, “Parallel chaos search based incremental extreme learning machine,” Neural Processing Letters, vol. 37, no. 3, pp. 277–301, 2013. View at: Publisher Site  Google Scholar
 Q. Yu, Y. Miche, E. Séverin, and A. Lendasse, “Bankruptcy prediction using Extreme Learning Machine and financial expertise,” Neurocomputing, vol. 128, pp. 296–302, 2014. View at: Publisher Site  Google Scholar
 K. I. Wong, M. V. Chi, P. K. Wong et al., “Sparse Bayesian extreme learning machine and its application to biofuel engine performance prediction,” Neurocomputing, vol. 2015, no. 149, pp. 397–404, 2015. View at: Google Scholar
 L. L. C. Kasun, H. Zhou, G. B. Huang, and C. M. Vong, “Representational learning with extreme learning machine,” IEEE Intelligent Systems, vol. 6, no. 28, pp. 31–34, 2013. View at: Google Scholar
 W. Johnson and J. Lindenstrauss, “Extensions of Lipschitz maps into a Hilbert space,” Modern Analysis and Probability, vol. 189, no. 26, pp. 189–206, 1984. View at: Google Scholar
 G. E. Hinton, S. Osindero, and Y.W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. View at: Publisher Site  Google Scholar
 G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” Momentum, vol. 1, no. 9, pp. 599–619, 2010. View at: Google Scholar
 R. Salakhutdinov and H. Larochelle, “Efficient learning of deep Boltzmann machines,” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS '10), vol. 9 of JMLR: Workshop and Conference Proceedings, pp. 693–700, 2010. View at: Google Scholar
 H. Zhou, G. B. Huang, Z. Lin et al., “Stacked extreme learning machines,” IEEE Transactions on Cybernetics, vol. 2, no. 2, pp. 1–13, 2014. View at: Google Scholar
 M. A. Hearst, S. T. Dumais, E. Osman, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems, vol. 13, no. 4, pp. 18–28, 1998. View at: Publisher Site  Google Scholar
 H. Yu, P. D. Reiner, T. Xie, T. Bartczak, and B. M. Wilamowski, “An incremental design of radial basis function networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1793–1803, 2014. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 Chao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.