Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 918105, 9 pages

http://dx.doi.org/10.1155/2015/918105

## Multiclass AdaBoost ELM and Its Application in LBP Based Face Recognition

^{1}Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, China^{2}School of Information & Engineering, Huzhou Teachers College, Huzhou 313000, China^{3}School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

Received 22 August 2014; Revised 11 November 2014; Accepted 18 November 2014

Academic Editor: Jiuwen Cao

Copyright © 2015 Yunliang Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Extreme learning machine (ELM) is a competitive machine learning technique, which is simple in theory and fast in implementation; it can identify faults quickly and precisely as compared with traditional identification techniques such as support vector machines (SVM). As verified by the simulation results, ELM tends to have better scalability and can achieve much better generalization performance and much faster learning speed compared with traditional SVM. In this paper, we introduce a multiclass AdaBoost based ELM ensemble method. In our approach, the ELM algorithm is selected as the basic ensemble predictor due to its rapid speed and good performance. Compared with the existing boosting ELM algorithm, our algorithm can be directly used in multiclass classification problem. We also carried out comparable experiments with face recognition datasets. The experimental results show that the proposed algorithm can not only make the predicting result more stable, but also achieve better generalization performance.

#### 1. Introduction

Many research works have been done in feedforward neural networks, which pointed out that the feedforward neural networks are able to not only approximate complex nonlinear mapping, but also provide models for some natural and artificial problems which classic parametric technics are unable to handle.

Recently, Huang et al. [1] proposed a new simple algorithm based on single layer feedforward networks (SLFNs) called extreme learning machine (ELM). For ELM randomly generates parameters of the networks, its learning speed can be thousands of times faster than traditional feedforward network learning algorithms like back-propagation (BP) algorithm, which needs to iterate many times to get optimal parameters.

In addition, Huang [2] also shows that in theory ELMs (with the same kernels) tend to outperform SVM and its variants in both regression and classification applications with much easier implementation. Based on this conclusion, the paper in the literature proposed by Wong et al. [3] explores the superiority of the fault identification time of ELM.

In view of the advantages of the algorithm, Cao et al. put it into some areas, such as landmark recognition [4] and protein sequence classification [5]. Besides, Cao et al. [6] proposed an improved learning algorithm which incorporates the voting method into the popular extreme learning machine in classification applications and outperforms the original ELM algorithm as well as several recent classification algorithms.

AdaBoost [7] is one of the most popular algorithms of classifier ensemble to improve the generalization performance. Wang and Li in [8] proposed an algorithm named dynamic AdaBoost ensemble ELM (named DAEELM in this paper). The proposed algorithm takes the ELM as the basic classifier and applies AdaBoost to solve binary classification problem. Similarly, Tian and Mao in [9] combined the modified AdaBoost.RT [10] with ELM to propose a new hybrid artificial intelligent technique called ensemble ELM. Ensemble ELM aims to improve ELM’s performance in regression problem.

However, until now, not so much works have been done to apply AdaBoost to ELM for multiclass classification problem directly. In Freund and Schapire’s work [11], they give two extensions of their boosting algorithm to multiclass prediction problems in which each example belongs to one of several possible classes (rather than just two). Since ELM can directly work for multiclass classification problem, this paper proposes an algorithm named multiclass AdaBoost ELM (MAELM). This new algorithm applies multiclass AdaBoost as an ensemble method to a number of ELMs. In addition, this paper proposes a structure to apply ELM and MAELM to local binary patterns (LBP) [12] based face recognition problem. Experiments in LBP based face recognition will show that the proposed algorithm outperforms the original ELM.

This paper is an extension of our previous work [13]. In this paper, we extend our previous work by proposing a new way to combine ELM with PCA instead of using random weights between the input layer and the hidden layer, as well as the bias of the activation function. Experiments in LBP based face recognition will show the stable and good performance with our extended approach.

The rest of the paper is organized as follows. Section 2 gives a brief review of the ELM and PCA, original and multiclass AdaBoost and LBP. The proposed MAELM is presented in Section 3. The experimental result will be shown in Section 4 and a short discussion about the proposed algorithm will be presented in Section 5. Finally, in Section 6, we conclude the paper.

#### 2. A Review of Related Work

In this section, a review of the original ELM algorithm and PCA and multiclass AdaBoost and the LBP based face recognition is presented.

##### 2.1. ELM

For arbitrary distinct samples , where and , standard SLFNs with hidden nodes and activation function are mathematically modeled as follows:where .

Here, is the weight vector connecting the th hidden node and the input nodes, is the weight vector connecting the th hidden node and the output nodes, and is the threshold of the th hidden node.

The standard SLFNs with hidden nodes with activation function can be compactly written as follows:where

Different from the conventional gradient-based solution of SLFNs, ELM simply solves the function by

is the Moore-Penrose generalized inverse of matrix . As Huang et al. have pointed out in [14], can be represented bywhere is an identity matrix, which has the same dimension with . is a constant number which can be set by the user. Adding can avoid the situation that is singular. Huang et al. [1] successfully applied ELM to solve binary classification problem and Huang et al. [14] extended the ELM to directly solve the multiclass classification problem.

Since the original ELM randomly generates the weights between the input layer and the hidden layer, as well as the bias of the activation function, its performance may be not so stable. Instead of that, some other ways like PCA algorithm rewards to try.

##### 2.2. PCA

Principal component analysis (PCA) was invented in 1901 by Pearson [15], as an analogue of the principal axes theorem in mechanics, which was later independently developed (and named) by Hotelling in the 1930s [16]. Now, it is mostly used as a tool in exploratory data analysis and for making predictive models. PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using -scores) the data matrix for each attribute [17]. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point) and loadings (the weight by which each standardized original variable should be multiplied to get the component score).

The procedure of PCA is as follows: Step 1. Compute the matrix which is the covariance matrix of . Step 2. Find out the eigenvalue of , . Step 3. Compute the standardization feature vector of . Step 4. Yield the principal components . is an identity matrix, which has the same dimension with . The matrix consists of row vectors, where each vector is the projection of the corresponding data vector from matrix .

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the dataset is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables.

##### 2.3. Original AdaBoost and Multiclass AdaBoost

AdaBoost has been very successfully applied in binary classification problem. Original AdaBoost is proposed in [7]. Before proposing the AdaBoost algorithm, the function is predefined as

AdaBoost algorithm is summarized as follows.

Given the training data , where denotes the th input feature vector with dimensions, denotes the label of the th input feature vector, where . Use to denote the th weak classifier and suppose weak classifiers will be combined.(1)Initialize the observation weights , .(2)For ,(a)fit a classifier to the training data using weights ;(b)compute the weighted error(c)compute the weight of the th classifier(d)update the weights of sample data, for all (e)renormalize , for all .(3)Output

Here, is or . In binary classification, any classifier whose generalization performance is better than is a weak classifier. For the original AdaBoost, we have the following.(1)For the th and the th classifiers, if , we have , which means the final ensemble classifier values more of the th classifier’s result. Specifically, if , , which means the final ensemble classifier just ignores the classifier since its effect is the same as random guess.(2)If the th classifier misclassifies the th sample, the th sample will have a big weight in the next iteration. As a result, the th classifier will pay more attention to it. On the contrary, if the th classifier classifies the th sample correctly, the th sample will have a small weight in the next iteration, which means th classifier will pay less attention to it.

However, for a -class classification problem, we have and . If a classifier’s generalization performance is better than (maybe much smaller than ), it can be called a weak classifier. Since original AdaBoost only takes a classifier whose generalization performance is better than as a weak classifier, obviously, it cannot be directly implemented to multiclass conditions that is bigger than 2. Freund and Schapire [11] extend the original AdaBoost to multiclass condition. The weight of the th classifier is modified as

Similar to the binary condition, for the th and the th classifiers, if , we have , which means the final ensemble classifier values more of the th classifier’s result. In particular, if , .

##### 2.4. LBP Based Face Recognition

The original LBP operator goes through each neighborhood in a picture. It takes the center pixel as the threshold value of the neighborhood and considers the result as a decimal number. The LBP operator is shown in Figure 1. Then, the texture of the picture can be represented by the histogram of all the decimal numbers.