Mathematical Problems in Engineering

Volume 2016, Article ID 1649486, 17 pages

http://dx.doi.org/10.1155/2016/1649486

## Deep Network Based on Stacked Orthogonal Convex Incremental ELM Autoencoders

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

Received 1 March 2016; Revised 7 May 2016; Accepted 31 May 2016

Academic Editor: Lotfi Senhadji

Copyright © 2016 Chao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Extreme learning machine (ELM) as an emerging technology has recently attracted many researchers’ interest due to its fast learning speed and state-of-the-art generalization ability in the implementation. Meanwhile, the incremental extreme learning machine (I-ELM) based on incremental learning algorithm was proposed which outperforms many popular learning algorithms. However, the incremental algorithms with ELM do not recalculate the output weights of all the existing nodes when a new node is added and cannot obtain the least-squares solution of output weight vectors. In this paper, we propose orthogonal convex incremental learning machine (OCI-ELM) with Gram-Schmidt orthogonalization method and Barron’s convex optimization learning method to solve the nonconvex optimization problem and least-squares solution problem, and then we give the rigorous proofs in theory. Moreover, in this paper, we propose a deep architecture based on stacked OCI-ELM autoencoders according to stacked generalization philosophy for solving large and complex data problems. The experimental results verified with both UCI datasets and large datasets demonstrate that the deep network based on stacked OCI-ELM autoencoders (DOC-IELM-AEs) outperforms the other methods mentioned in the paper with better performance on regression and classification problems.

#### 1. Introduction

Extreme learning machine (ELM) proposed by Huang et al. [1, 2] is a specific type of single-hidden layer feedforward network (SLFN) with randomly generated additive or RBF hidden nodes and hidden node parameters, which has recently been extensively studied by many researchers in various areas of scientific research and engineering due to the excellent approximation capability. Wang et al. presented ASLGEM-ELM algorithm, which provides some useful guidelines for improving the generalization ability of SLFNs trained with ELM [3]. Alongside probing deeply into the research of theory and its application, ELM has become one of the leading trends for fast learning [4–7]. Recently, Huang et al. [8] have proposed an algorithm called incremental extreme learning machine (I-ELM) which randomly adds nodes to the hidden layer one by one and freezes the output weights of the existing hidden nodes when a new hidden node is added [9–12]. Then, Huang et al. [13] also showed its universal approximation capability for the case of fully complex hidden nodes. I-ELM is fully automatically implemented and in theory no intervention is required for the learning process from users. But there still exist some issues to be tackled [14]:(1)The redundant nodes can be generated in I-ELM, which have a minor effect on the outputs of the network. Moreover, the existence of redundant nodes can eventually increase the complexity of the network.(2)The convergence rate of I-ELM is slower than ELM, and the number of hidden nodes in I-ELM is sometimes larger than the dimension of samples for the training.

In this paper, we propose a method called orthogonal convex extreme learning machine (OCI-ELM) to further settle the aforementioned problems of I-ELM. With the rigorous proofs in theory, we can obtain the least-squares solution of and faster convergence rate by adopting the Gram-Schmidt orthogonalization method incorporated into CI-ELM [15]. The simulations on real-world datasets show that the proposed OCI-ELM algorithm can achieve faster convergence rates, more compact neural network, and better generalization performance than both I-ELM and the improved I-ELM algorithms while keeping the simplicity and efficiency of ELM.

Recently, deep learning has attracted many research interests with its remarkable success in many applications [16–18]. Deep learning is an artificial neural network learning algorithm which has multilayer perceptrons. Deep learning has achieved an approximation of complex functions and alleviated the optimization difficulty associated with the deep models [19–21]. Motivated by the remarkable success of deep learning [22, 23], we propose a new stacked architecture to solve large and complex data problems using OCI-ELM autoencoder as the training algorithm in each layer, which incorporates the excellent performance of OCI-ELM with the ability of complex function approximation derived from deep architectures. We implemented OCI-ELM autoencoder in each iteration of deep orthogonal convex incremental extreme machine (DOC-IELM) to reconstruct the input data and estimate the errors of the prediction functions with the scheme of layer-by-layer architectures. Both the supervised and the unsupervised data all can be the pertaining input of the new proposed deep network. Moreover, the OCI-ELM autoencoder-based deep network (DOC-IELM-AEs) can suffice to achieve the efficiency improvement for generalization performance.

To show the effectiveness of DOC-IELM-AEs, we apply it to both the ordinary real-world datasets with UCI datasets and large datasets with MNIST, OCR Letters, NORB, and USPS datasets. The simulations show that the proposed deep model possesses better accuracy of testing and more compact network architecture than the aforementioned improved I-ELM and other deep models without incurring the out-of-memory problem.

This paper is organized as follows. Section 2 reviews the preliminary knowledge of incremental extreme learning machine (I-ELM). Section 3 describes OCI-ELM algorithm, the proposed model which adopts the Gram-Schmidt orthogonalization method into convex I-ELM (CI-ELM). Section 4 makes a comparison between OCI-ELM and other algorithms. Section 5 presents the details of DOC-IELM-AEs algorithm and compares the performance with deep architecture models. Section 6 applies the DOC-IELM-AEs algorithm into elongation prediction of strips. Finally, Section 7 concludes this paper.

#### 2. Related Works

In this section, the main concepts and theory of the I-ELM [8] algorithm are shortly reviewed. For the sake of generality, we assume that the network has only one linear output node, and all the analysis can be easily extended into multinonlinear output nodes cases. Consider a training dataset ; the SLFN with additive hidden nodes and activation function can be represented bywhere is the weight vector connecting the input layer to the th hidden node, is the weight connecting the th hidden node to the output node, is the threshold of the th hidden node, and is the hidden node activation function.

The I-ELM proposed by Huang et al. is different from the conventional ELM algorithm; I-ELM is an automatic algorithm which can randomly add hidden nodes to the network one by one and freeze all the weights of the existing hidden nodes when a new hidden node is added, until the expected learning accuracy is obtained or the maximum number of hidden nodes is reached. Thus, I-ELM algorithm can be summarized in Algorithm 1.

*Algorithm 1 (incremental extreme learning machine (I-ELM)). *Given a training dataset , activation function , number of hidden nodes , expected learning accuracy , and maximum number of hidden nodes , one has the following.*Step 1* (initialization). Let and residual error , where .*Step 2* (learning step). While , ,(a)increase the number of hidden nodes by one;(b)assign random input weight and bias for hidden nodes ;(c)calculate the residual error after adding the new hidden node;(d)calculate the output weight for the new hidden nodes: ;(e)calculate the residual error: ;Endwhile.

#### 3. The Proposed Orthogonal Convex Incremental Extreme Learning Machine (OCI-ELM)

The motivation for the work in this section comes from the important properties of basic ELM as follows:(1)The special solution is one of the least-squares solutions of a general linear system , meaning that the smallest training error can be reached by this special solution: .(2)The smallest norm of weights: the special solution has the smallest norm among all the least-squares solutions of :(3)The minimum norm least-squares solution of is unique, which is .

In this section, we propose an improved I-ELM algorithm (OCI-ELM) based on Gram-Schmidt orthogonalization method combined with Barron’s convex optimization learning method and prove the OCI-ELM algorithm in theory which can obtain the least-squares solution of . Meanwhile, OCI-ELM can achieve a more compact network architecture, faster convergence rate, and better generalization performance than other improved I-ELM algorithms while retaining the I-ELM’s simplicity and efficiency.

Theorem 2. *Gram-Schmidt orthogonalization process converts linearly independent vectors into orthogonal vectors [24]. Given a linearly independent vector set in the inner product space , the vector set for Gram-Schmidt orthogonalization process is as follows [25]:where is the set of standardized vectors and form an orthogonal set with the same linear span. For each index , .*

Theorem 3. *Given an orthogonal vector set in the inner product space , if vector can be expressed as a linear representation of , one has*

*Proof. *Given the vector set and vector , suppose there exist scalars ; then the linear combination of those vectors with those scalars as coefficients isSubstituting (4) into (5), we have :

CI-ELM was originally proposed by Huang and Chen [15], which incorporates Barron’s convex optimization learning method into I-ELM. By recalculating the output weights of the existing hidden nodes randomly generated after a new node is added, the CI-ELM can obtain better performance than I-ELM. Incorporated with Gram-Schmidt orthogonalization and Barron’s convex optimization learning method, the process of OCI-ELM algorithm can be described in Algorithm 4.

*Algorithm 4 (orthogonal convex incremental extreme learning machine (OCI-ELM)). *Given a training dataset , where and , and given activation function , maximum number of iterations , and expected learning accuracy , one has the following.*Step 1* (initialization). Let the number of initial hidden nodes , the number of iterations , and residual error , where .*Step 2*. This step consists of two steps as follows.*Orthogonalization Step.* In this step, the following is carried out:Increase the number of hidden nodes and by one, respectively: and .Randomly assign hidden node parameters for new hidden node and calculate the output , and the hidden layer output matrix ,*Learning Step*. While , ,calculate the output weight for the newly added hidden node:recalculate the output weight vectors of all existing hidden nodes if :calculate the residual error after adding the new hidden node :Endwhile.

The rigorous proof on the conclusion is detailedly discussed where OCI-ELM can obtain the least-squares solution of .

Theorem 5. *Given a training dataset and number of hidden nodes , where and , the hidden layer output matrix is , and the matrix of the output weights from the hidden nodes to the output nodes is . Let denote the residual error function, and , holds with probability one if and for all .*

*Proof. *The proof consists of two steps:(a)Firstly, we prove .(b)And then, we further prove .(a) According to the condition given above, we have the following:(1)Here,(2)When the output weight , we have(3)When the output weight , we also have(4)When the output weight , suppose that, for all , we have(5)When the output weight , suppose that, for all , we haveSo, ; that is, . Therefore,(b) According to (17), we have , where is arbitrary; then, we haveAnd holds only if . Therefore, .

#### 4. Experiments and Analysis

In this section, we tested the generalization performance of the proposed OCI-ELM with other similar learning algorithms on ten UCI real-world datasets, including five regression and five classification problems, as shown in Table 1. The simulations are conducted in MATLAB 2013a environment running on Windows 7 machine with 32 GB of memory and i7-990X (3.46 GHz) processor.