Computational Intelligence and Neuroscience

Volume 2016, Article ID 3049632, 10 pages

http://dx.doi.org/10.1155/2016/3049632

## Deep Convolutional Extreme Learning Machine and Its Application in Handwritten Digit Classification

^{1}College of Information and Electrical Engineering, Ludong University, Yantai 264025, China^{2}Department of Aircraft Engineering, Naval Aeronautical and Astronautical University, Yantai 264001, China

Received 27 April 2016; Revised 7 July 2016; Accepted 19 July 2016

Academic Editor: Stefano Squartini

Copyright © 2016 Shan Pang and Xinyi Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In recent years, some deep learning methods have been developed and applied to image classification applications, such as convolutional neuron network (CNN) and deep belief network (DBN). However they are suffering from some problems like local minima, slow convergence rate, and intensive human intervention. In this paper, we propose a rapid learning method, namely, deep convolutional extreme learning machine (DC-ELM), which combines the power of CNN and fast training of ELM. It uses multiple alternate convolution layers and pooling layers to effectively abstract high level features from input images. Then the abstracted features are fed to an ELM classifier, which leads to better generalization performance with faster learning speed. DC-ELM also introduces stochastic pooling in the last hidden layer to reduce dimensionality of features greatly, thus saving much training time and computation resources. We systematically evaluated the performance of DC-ELM on two handwritten digit data sets: MNIST and USPS. Experimental results show that our method achieved better testing accuracy with significantly shorter training time in comparison with deep learning methods and other ELM methods.

#### 1. Introduction

Extreme learning machine is a novel learning algorithm for general single-hidden-layer neural networks proposed by Huang et al. [1]. In ELM, the input weights and hidden biases are randomly generated, and the output weights are analytically determined by regularized least square method, providing a simple deterministic solution. There are no iterations and parameters tuning as in back propagation (BP) based neural networks (NNs). Furthermore, solving the regularized least squares in ELM is also faster than solving the quadratic programming problem in standard support vector machine (SVM) method. Studies have proved that ELM learns much faster with higher generalization performance than NNs or SVM [2].

Due to its extreme fast training and good generalization performance, ELM has been becoming a significant research topic for pattern recognition and machine learning. ELM and its variant methods present competitive accuracy with superb efficiency in many pattern recognition applications such as face recognition [3, 4], engine fault diagnosis [5], hyperspectral images classification [6], and human action recognition [7, 8]. However, due to their shallow architectures, feature learning using ELM methods may not be effective for some image classification applications, even with a large number of hidden nodes.

In recent years, some deep learning methods have been highlighted and show promising results and significantly outperform shallow neural networks in the field of image classification [9–12]. Composed of many layers, deep learning methods gradually extract more complicated and invariant features from the raw input images than shallow neural networks [13]. The emergence of many large-scale data sets and more powerful computing environments has made the training of deep neural networks possible, leading to a widespread application of deep learning methods.

Among these methods, convolutional neural network (CNN) has gained incredible popularity in many different domains. It is even becoming the default option for difficult tasks on large image data sets. With local receptive field (LRF) and shared weights, CNN is able to take advantage of the 2D structure of input images and has fewer parameters than fully connected deep networks with the same number of hidden nodes; thus it is easier to train. As all the hidden nodes in CNN need to be tuned with BP learning method, CNN learning faces the problems inherited from BP algorithm such as local minima and time-consuming and intensive human intervention.

On the contrary, ELM does not need tuning of parameters and is extremely fast to implement. Therefore Huang combines the concept of LRF with ELM and proposed a local receptive field based extreme learning machine (LRF-ELM) [14] in order to learn local correlations of input images. The input layer and hidden convolution layer in LRF-ELM are locally connected which allows the network to consider local structures of images. Results on NORB data set show that it has better performance than standard CNN and DBN.

Since LRF-ELM has only one convolution layer followed by a pooling layer, the performance is restricted by its shallow architecture. Another problem is that many feature maps are required in its convolution and pooling layer to attain good performance. Therefore, LRF-ELM consumes much computer memory in implementation. To solve these problems, in this paper, we propose a deep convolutional extreme learning machine (DC-ELM). It adopts multiple alternate convolution layers and pooling layers to obtain more abstract and meaningful feature representations than LRF-ELM. Different from CNN, the local receptive weights are randomly generated without tuning and the output weights are analytically calculated. In order to save computer memory and training complexity, it adopts stochastic pooling [15] in the last hidden layer to reduce dimensionality of feature vector.

To verify the effectiveness of the proposed algorithm, we applied it to some handwritten digits classification tasks and compared it with other state-of-the-art methods. Handwritten digits’ recognition has its real world application, such as the postal mail sorting or form data processing [16]. Several methods based on neural networks [17–19], machine learning [20, 21], and other techniques [22, 23] have been studied. Recently some ELM based methods have also been applied to handwritten digits recognition and show good performance on MNIST data set. The ML ELM proposed in [24] achieved 99.03% correct classification. And a test accuracy of 99.19% by deep ELM is achieved in [25]. In [26] a RF-C-ELM was proposed and attained a test accuracy of 99.43%, very close to 99.61%, obtained by Deep Conv. Net [27].

The rest of the paper is organized as follows. Section 2 gives a brief review of ELM and LRF-ELM. Section 3 describes the proposed DC-ELM. In Sections 4 and 5, our method is applied to MNIST and USPS data sets and compared with other state-of-the-art methods. Section 6 analyzes the effect of stochastic pooling and some other parameters. Finally, Section 7 draws the conclusions and points out the future work.

#### 2. Reviews of ELM and LRF-ELM

##### 2.1. Extreme Learning Machine (ELM)

ELM was proposed for single-hidden layer feedforward neural networks (SLFNs). It is very different from conventional neural network learning algorithms. It randomly chooses the parameters of hidden nodes and analytically determines the output weights. Thus the training is extremely fast and efficiently completed without time-consuming iterations.

The input data is mapped to an -dimensional ELM random feature space, and the network output iswhere is the matrix of output weights and are the hidden node outputs for input . is the output of th hidden node. Given training samples , the ELM can approximate these samples with zero error which means that, and are the matrix of desired output. The output weights can then be calculated using regularized least squares method as follows:where is the regularization parameter, which is used to obtain better generalization performance.

##### 2.2. Local Receptive Fields Based Extreme Learning Machine (LRF-ELM)

As the name suggests, LRF-ELM introduces local receptive field to the input layer, thus obtaining a locally connected ELM. The hidden layers of LRF-ELM consists of a convolution layer and a pooling layer. They are composed of several feature maps. The input weights between input and convolution layers are first randomly generated according to some continuous probability distribution and then orthogonalized in order to obtain a more complete set of features. The square root pooling is used to formulate the combinatorial node in pooling layer. The square and summation operations introduce rectification nonlinearity and translational invariance, respectively, into the network, which is very important for successful image processing tasks. The pooling layer is in full connection with the output layer. The output weights are analytically calculated as in the unified ELM using regularized least squares.

LRF-ELM with local randomly connected hidden nodes can be regarded as a specific type of ELM. Huang has proved that the universal approximation/classification capability of such LRF-ELM can still be preserved.

#### 3. Deep Convolutional Extreme Learning Machine

In this section, we propose a new deep convolutional extreme learning machine designed to solve image classification tasks. DC-ELM combines the feature abstracting performance of convolutional neuron network and fast training of extreme learning machine.

As shown in Figure 1, the structure of DC-ELM consists of an input layer, an output layer, and several hidden layers which are arranged alternately as one convolution layer followed by one pooling layer. The convolution layer consists of several feature maps which are grouped by convolution nodes. The input weights of the same feature map are shared while being distinct among different maps. The square root pooling layer is used to introduce translational invariance to the network. It has the same number of feature maps with the same size as the previous convolution layer. The node in any feature map of a convolution layer is connected to all the feature maps in its previous pooling layer, while the node on a feature map in pooling layer is connected to only one corresponding feature map in its previous convolution layer as shown in Figure 1. The last pooling layer adopts stochastic pooling strategy, thus reducing the size of its feature maps. It is in full connection with the output layer.