Computational Intelligence and Neuroscience

Volume 2017 (2017), Article ID 7479140, 8 pages

https://doi.org/10.1155/2017/7479140

## Mexican Hat Wavelet Kernel ELM for Multiclass Classification

School of Electrical Engineering, Zhengzhou University, Zhengzhou, China

Correspondence should be addressed to Yi-Fan Song; moc.qq@320938475

Received 25 November 2016; Revised 23 January 2017; Accepted 24 January 2017; Published 21 February 2017

Academic Editor: José David Martín-Guerrero

Copyright © 2017 Jie Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers.

#### 1. Introduction

Extreme learning machine, which was proposed by Huang et al. [1] in 2004, is a model of single-hidden layer feedforward neural network. In this model, input weights and hidden layer biases are initialized randomly, and output weights are obtained by using the Moore-Penrose generalized inverse of the hidden layer output matrix. Compared with the conventional BP neural networks, ELM has faster learning speed, higher testing accuracy, and lower computational complexity. Therefore, ELM is widely used in sales forecasting [2], image quality assessment [3], power loss analysis [4], and so on. In 2006, Huang et al. [5] proposed incremental extreme learning machine (I-ELM), which continuously increased the number of hidden layer nodes to improve the training accuracy. Subsequently, Li [6] combined I-ELM with the convex optimization learning method and proposed ECI-ELM in 2014, which reduced the training time of I-ELM. This improvement overcame the weakness of randomly selecting weights in I-ELM and eventually improved the training accuracy. At the same time, Wang and Zhang [7] introduced the Gram-Schmidt orthogonalization method into I-ELM and saved the training time of I-ELM to a large degree. But, in general, I-ELM and its varieties only improve the training accuracy. Their numbers of hidden layer nodes are very likely to exceed the number of samples. Thus, I-ELM greatly improves the training time. In another perspective, in order to achieve a higher training accuracy, Rong et al. [8] used statistical methods to measure the relevance of hidden nodes of ELM and proposed P-ELM in 2008. Then, in 2010, Miche et al. [9] proposed OP-ELM, which is an improvement of P-ELM. In addition, Akusok et al. [10] proposed a high-performance ELM model in 2015, which provides a solid ground for tackling numerous Big Data challenges. However, none of these methods has changed the characteristic of the random selection of input weights. In addition, the linear weighted mapping method in original ELM is not replaced at all.

Therefore, both ELM and its varieties have some inevitable problems. Because of the random selection of input weights, some hidden nodes may be given an input weight that is very close to 0, which are commonly called dead nodes. This phenomenon leads to the minimal effect of these nodes and eventually affects the output accuracy. With the increment of the number of samples, the hidden nodes number also becomes large. Thus, some high dimensional dot product operations will appear in the training process. Eventually, that will cause the increase of computational complexity and training time. This problem is commonly called dimension explosion. For nonlinear samples, the linear weighted mapping method often has inevitable error, which leads to the reduction of the training accuracy.

In order to solve the above problems, Huang et al. [11] proposed the kernel extreme learning machine (KELM) in 2012, which utilized the kernel function to replace the linear weighted mapping method. Initially, the kernel function they selected is a Gauss function. Although [11] solves the problem of dead nodes and dimension explosion in a sense, the performance of the traditional kernel function for multiclass classification problems is still not very good. From [12, 13], we know that wavelet functions can be used in SVM and ELM, which have a strong fitting capability. Therefore, in this paper, we propose a Mexican Hat wavelet kernel ELM (MHW-KELM) classifier, which effectively solves the problems in the conventional classifier. Compared with the traditional KELM, the MHW-KELM classifier achieves better results on dealing with the multiclass classification problems. Because of that, the new kernel function improves the training accuracy.

The basic principle of ELM and some theorems are shown in Section 2 of this paper. In Section 3, the Mexican Hat wavelet kernel ELM is proposed, and its validity is also proved. Performance evaluation is presented in Section 4. Conclusion is given in Section 5.

#### 2. Preliminary Work

##### 2.1. ELM Model

Let us suppose that there are arbitrary distinct samples , , . If the number of the hidden nodes is and the activation function is , then we can randomly select the initial value of the input weights and the hidden biases . So, the hidden layer output function of ELM can be obtained. It is shown aswhere , , .

If the output weights are , according to the proof given by Huang et al. [1], the norm of is smaller, and the generalization performance of ELM is better. Therefore, the output weights can be obtained by finding the least square solution of the problemwhere is the th output vector of hidden layer, is the th label vector, and is the error between the th network output vector and the label vector.

According to KKT theory, the above problem can be transformed into a Lagrange functionwhere each of the Lagrange multipliers corresponds to a sample . By calculating the partial derivative of (3), we can get the following set of equations:where . And the least square solution of can be obtained by calculating the three equations in (4a), (4b), and (4c). The solution isand the output function of ELM is

##### 2.2. Translation-Invariant Kernel Theorem

Kernel function method is often used in SVM as a method of replacing dot product. According to the Mercer theorem (see [14]), by introducing the kernel function , we can replace the calculation of dot product in ELM. In order to reduce the computational complexity of high dimensional dot product, it is necessary to ensure that is only a mapping method of the relative position of two input samples (see (7)).

The kernel functions which satisfy (7) are called the translation-invariant kernel function. In fact, it is difficult to prove that a translation-invariant kernel function satisfies the Mercer theorem. Fortunately, for the translation-invariant kernel function, the following theorem provides a necessary and sufficient condition to make it become an admissible support vector kernel.

Theorem 1 (translation-invariant kernel theorem; see [15, 16]). *A translation-invariant kernel is an admissible support vector kernel, if and only if the Fourier transformis nonnegative.*

The kernel function selection method of ELM is the same as SVM. Therefore, the above theorem can also be used to determine whether a function is an admissible ELM kernel. The commonly used translation-invariant kernel functions are Gauss kernel function and polynomial kernel function. In these two functions, Gauss kernel function is a kind of translation-invariant kernel function. And the expression of the two kernel functions can be given as

In (9), is a Gauss core width and is an adjustable polynomial power exponent.

#### 3. Mexican Hat Wavelet Kernel ELM

##### 3.1. Kernel ELM

In original ELM model, the linear weighted hidden output function is usually not satisfied with the mapping method of the nonlinear samples. In order to solve this problem, we can replace and in (6) with a kernel function . And the result iswhere is the kernel function matrix of (see (11)).

##### 3.2. Mexican Hat Wavelet Kernel Function

In this part, the Mexican Hat wavelet kernel function is proposed. It is also proved that Mexican Hat wavelet function is an admissible ELM kernel.

Theorem 2 (see [12]). *Let be a mother wavelet. Let and denote the dilation and translation, respectively, and . If , then the dot product wavelet kernel is**If it satisfies the translation-invariant kernel theorem, the following translation-invariant kernel function can be obtained:*

The proof of Theorem 2 is given in [12]; we will not repeat it in this paper. We use Mexican Hat wavelet as the mother wavelet (see (14)). Then, the Mexican Hat wavelet kernel function is derived (see (15)). In this paper, it is also proved that Mexican Hat wavelet satisfies the translation-invariant kernel theorem. In other words, it is also an admissible ELM kernel.

Lemma 3. *As a kind of translation-invariant kernel function, Mexican Hat wavelet is an admissible ELM kernel.*

*Proof. *Firstly, it should be proved that the Fourier transform of Mexican Hat wavelet is nonnegative (see (16)).Equation (17) can be decomposed into a set of integral inequalities (see (19)). And the derivation process isThe integral term in (17) can be written aswhere is the integral term in (17), According to the translation invariance of the integral, it is easy to get (21) by using the partial integration method. The answer isSubstituting (21) into (18), we haveThen, substituting (22) into (17), we can obtain the Fourier transformFrom (23), it is known that if , . Therefore, according to the translation-invariant kernel theorem, Mexican Hat wavelet is an admissible ELM kernel.

##### 3.3. MHW-KELM Classifier

We have already proved that Mexican Hat wavelet is an admissible ELM kernel. So, we can substitute (15) into (10) and construct MHW-KELM classifier. For a binary classification problem, the output function of the new classifier is

Besides, this classifier can also be used for the multiclass classification problems. And the output function is

Equation (25) means the classification result is expressed by the index value of the maximum value in output vector. In addition, we can combine the nonnegative constant parameter of Mexican Hat wavelet and the penalty factor into an individual and use some evolutionary algorithms such as PSO [17, 18] to find the best values of these parameters. Next, we will analyze the performance of the proposed classifier.

#### 4. Performance Evaluation

This section will analyze the performance of MHW-KELM and compare it with the traditional Gauss-KELM, Poly-KELM, original ELM, and BP classifier. All these algorithms run on the R2014a MATLAB software. The operating environment is Core-i7, 2.6 GHz CPU, 8 G RAM. We choose scaled conjugate gradient algorithm to optimize BP neural network, which is faster than normal BP neural network. In order to get excellent performance, the number of hidden nodes of original ELM and BP is selected as 100% and 30% of training samples, respectively. The data sets used in the experiment are from the UCI database [19]. They are Abalone, Auto MPG, Bank, Evaluation, Wine, Wine Quality, Iris, Glass, Image, Yeast, Zoo, and Letter, respectively. The basic features of these 12 data sets are shown in Table 1.