Mathematical Problems in Engineering

Volume 2015, Article ID 491587, 8 pages

http://dx.doi.org/10.1155/2015/491587

## Two-Dimensional Extreme Learning Machine

College of Command Information System, PLA University of Science and Technology, Nanjing 210007, China

Received 17 August 2014; Revised 5 November 2014; Accepted 6 November 2014

Academic Editor: Amaury Lendasse

Copyright © 2015 Bo Jia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Extreme learning machine (ELM) has achieved wide attention due to faster learning speed compared with conventional neural network models like support vector machine (SVM) and back-propagation (BP) networks. However, like many other methods, ELM is originally proposed to handle vector pattern while nonvector patterns in real applications need to be explored, such as image data. We propose the two-dimensional extreme learning machine (2DELM) based on the very natural idea to deal with matrix data directly. Unlike original ELM which handles vectors, 2DELM take the matrices as input features without vectorization. Empirical studies on several real image datasets show the efficiency and effectiveness of the algorithm.

#### 1. Introduction

Pattern representation is probably one of the basic problems in machine learning; almost all learning algorithms aim to build the mapping functions from the input to output. The output value of a learning model is always straightforward while different input representations could influence the results much. For statistical learning, the input pattern is commonly represented by a vector which contains the values belongs to corresponding features. Even though the original data is not sampled as vectors, there exists a standard preprocessing method named vectorization, which aims to transform the original data into vectors for the convenience of computation. Taking the face image, for example, each sample of a -by- face image is always transformed into a -length vector by concatenating all columns or rows, so that the sample can be processed by popular learning algorithms such as support vector machine (SVM) or artificial neural networks.* Input vectors* almost become another name for input samples, and some of them have discriminative ability which define the margin of largest separation are called support vectors in SVM [1].

On the one hand, vectorization helps the input data to fit in mature models as well as to accelerate computation procedure using popular linear algebra libraries. On the other hand, the drawbacks of vectorizing image data are obvious from at least two aspects [2, 3]. (1) Structural or contextual information may be lost during the transformation due to the changes of relative position of the pixels, and the reason is quite intuitive. (2) Vectorization needs more parameters and thus leads to the curse of dimensionality. For example, in order to classify 1024 × 1024 images by neural networks with 1000 hidden nodes, one need parameters in the first layer. The feedforward computation can be slow.

Now look at the general class of mapping function adopted by many discriminative models, which take the sample vector as input and classification label or regression value as the output:where is the input vector and is the th output value of the hidden layer in three-layer neural network, or the th output value of other two-layer model such as least square regression and logistic regression. is the parameter vector which connects and the final output value. In order to have a scalar output easily, a linear or nonlinear transformation needs to be conducted on the input space; thus is sometimes regarded as point in the feature space. Function controls the final output value according to specific learning tasks. The definition of feature mapping function iswhere is the weight vector that connects the input nodes and the th hidden node in neural network models and is the bias of the th hidden node in this case. is probably a nonlinear continuous function. For linear regression models as well as back-propagation networks, the are the main parameters that need to be learned. The feature mapping stage here is a linear transformation, and the output of each hidden node is a linear combination of input units and corresponding weights.

Similar to vector case, the feature mapping function for the matrix pattern looks differently as the following form [4]:where and are two weight vectors similar to in the vector pattern. This might be the simplest way to transform a matrix into a scalar using vector inner product similar to (2), since matrix-vector product is essentially sum of several vector inner products.

We can see that there are only parameters needed instead of in (2) for each hidden node. From this point, using matrix pattern could reduce model complexity with fewer parameters, even if the original sample is not matrix as long as the vector can be recombination into matrix. Take the single layer feedforward neural network (SLFN), for example, here, as Figure 1 shows the differences between two input patterns: (a) needs nodes in the input layer while (b) just needs for the same input sample.