Abstract

According to the characteristics that the kernel function of extreme learning machine (ELM) and its performance have a strong correlation, a novel extreme learning machine based on a generalized triangle Hermitian kernel function was proposed in this paper. First, the generalized triangle Hermitian kernel function was constructed by using the product of triangular kernel and generalized Hermite Dirichlet kernel, and the proposed kernel function was proved as a valid kernel function of extreme learning machine. Then, the learning methodology of the extreme learning machine based on the proposed kernel function was presented. The biggest advantage of the proposed kernel is its kernel parameter values only chosen in the natural numbers, which thus can greatly shorten the computational time of parameter optimization and retain more of its sample data structure information. Experiments were performed on a number of binary classification, multiclassification, and regression datasets from the UCI benchmark repository. The experiment results demonstrated that the robustness and generalization performance of the proposed method are outperformed compared to other extreme learning machines with different kernels. Furthermore, the learning speed of proposed method is faster than support vector machine (SVM) methods.

1. Introduction

The kernel extreme learning machine (KELM) is proposed by Huang et al. in 2010 by applying the kernel functions to ELM algorithm [1, 2] and where the random hidden layer feature mapping in ELM is substituted by the kernel mapping [3]. It effectively improves the undesirable generalization performance and stability caused by the stochastic nature of hidden layer output matrix and greatly reduces computational complexity. In KELM, optimization of the number of hidden layer nodes is avoided and the least square optimal solution can be obtained. Compared with SVM [4] and basic ELM [5, 6], it can provide more stable and better generalization performance. Hence, KELM has been widely applied in classification and regression problems [7, 8].

It is well known that the learning ability and generalization performance of ELM mainly depend on the kernel function, and different kernel functions or same kernel function with different parameters has different influence on the generalization performance. Besides, the time required to search for the optimal kernel parameters is different using various kernel functions. Normally, the selection and optimization of kernel parameters are much tedious and time-consuming [9, 10]. Liu et al. [11] pointed out that the common Gauss kernel function and polynomial kernel function are very sensitive to the changes of kernel parameters, so the selection range of kernel parameters is large with small stepsize leading to high computation complex issue. In order to solve this issue, a series of SVM kernel functions based on generalized orthogonal polynomials have been constructed in recent years [1216]. Ozer et al. [12] introduced a set of Chebyshev kernel functions derived from the generalized Chebyshev polynomials. The test results show that the generalized Chebyshev kernel approaches the minimum support vector number for classification in general. Furthermore, Zhang et al. constructed a series of novel SVM kernel functions, such as Laguerre kernel function from generalized Laguerre polynomial and Hermite kernel function from generalized Hermite polynomial [1315]. Those algorithms can shorten the time of parameter optimization; however, the processing to the parameters of weight function is so simple that the influence of structure information of sample data on the generalization performance is neglected. Tian and Wang [17] verified that Gaussian Hermite kernel function can achieve the highest classification accuracy in the binary classification problem compared to the rest of above orthogonal polynomial kernel functions; however, the efficiency of its training and the robustness are relatively lower.

According to above analysis, based on Hermite orthogonal polynomials, a mixed kernel function called Generalized Triangular Hermite kernel function is constructed by using the product of triangular kernel and generalized Hermite Dirichlet kernel. This kernel function has only one parameter chosen from a small range of integer numbers, thus the parameter optimization is facilitated greatly, and more structure information of sample data is retained. It is proved that generalized triangle Hermite kernel can be used as an allowed kernel function of extreme learning machine in theory. The effectiveness of the proposed method for regression and binary, multiclass classification problems is demonstrated by performing numerical experiments on a number of real-world datasets from the UCI benchmark repository and comparing their results with SVM and other extreme learning machines with different kernels.

2. Kernel Extreme Learning Machine

2.1. Introduction to Kernel Extreme Learning Machine

A training set is given, the output function of hidden node is chosen as with hidden nodes, and ELM algorithm can be written as follows.

Step 1. Randomly generate hidden node parameters .

Step 2. Calculate the hidden layer output matrix (ensure to be full rank):

Step 3. Calculate the output weight vector :where is the regularization coefficient and .

The output function of ELM is

If the hidden layer feature mapping is unknown, a kernel matrix can be defined to replace by using Mercer’s condition. Thus, KELM algorithm is generated as follows:

Finally, the output function of KELM is defined as

2.2. Introduction to the Property of KELM Kernel Function

Some of the well-known common KELM kernel functions are(1)polynomial kernel:;(2)Gaussian kernel:;(3)Laplacian kernel:.

In addition to the above kernel functions, a new kernel function also can be constructed according to the property of the kernel function.

Property 1. Assume that and are valid kernel functions on ; then kernel functions and also are valid on .

Theorem 1. Let be an integrable bounded continuous function. Then the necessary and sufficient condition for the translation invariant function to be a kernel function is and Fourier transform isLike SVM, a function is an allowed KELM kernel function as long as it satisfies Mercer’s condition.

Mercer Theorem [18]. Assume that, for , is a continuous symmetric real value function on such that the following integration should always be nonnegative for every :. Then must be a valid kernel function.

3. Triangular Hermite Kernel Extreme Learning Machine

3.1. Construction of Triangular Kernel Function

Laplace kernel function also is a radial basis function. Its classification performance is nearly equivalent to the Gaussian kernel , but it is less sensitive for the changes of parameter. The Laplace kernel can be used as an alternative when using the Gaussian becomes too expensive.

While , using the Taylor formula, Laplace kernel can be simplified as

Figure 1 shows the function curve of (7) and , when , where and .

Seen from Figure 1, two types of kernel functions are quite different, and is relatively less sensitive to changes of the parameter . A typical triangle is presented on its function curve; thus, it is called triangular kernel function [19].

Given a sample set , where is the sample mean and is the number of training samples, in order to further simplify (7), let [16] be twice as long as the maximum distance of all sample points to sample mean such that formula holds with probability one. Thus, the simplified triangular kernel function is obtained:where .

Equation (8) is a translation invariant function; according to Theorem 1, is strictly proved to be a valid KELM kernel function in the following proof.

Proof. ; it is well known that is an integrable bounded continuous function on by function analysis and such that the Fourier transform of isThe proof is completed.

References [1216] constructed the weighting function as , wheredenotes the dimension of vector , in which since the parameter of Gaussian kernel is directly set with , the data structure information is lost although the parameter optimization is simplified. However, the setting of parameter in triangular kernel function proposed in this paper just makes up for the shortcoming of it.

In order to reflect the difference more intuitively between and, Figures 2 and 3 show the graph of the above two kernel functions, where , and , .

As is shown in Figures 2 and 3, both the vectors and are one-dimensional (). When takes a fixed value, the graph of kernel function is invariably constant, but the graph of kernel function changes as the value interval of changes. It is well known that the choice of different kernel functions is to select different criteria to measure the similarity and the degree of similarity [18].

Consequently, for the same point , the similarity in different intervals should be different, and the value of increases as the expansion of the interval, while the value of in the four intervals always remains changeless. Therefore, it can be said that the parameter of kernel function is set to retain more distance similarity information of sample data; in addition, its computational cost is very low.

3.2. Construction of Generalized Hermite Dirichlet Kernel

Hermite polynomial [16] is a kind of orthogonal polynomials with respect to the weighting function between the intervals , which is defined as

It satisfies the orthogonal relationship:

It has a recursive relationship:

Owing to the orthogonality, variability, and universal approximation function capability of Hermite polynomial, a general Hermite kernel function can be constructed as a good alternative to other common kernel functions (Gaussian kernel, polynomial kernel, etc.). For this purpose, let the scalar variable be instead of row vector and be substituted as follows correspondingly:where is the transpose of .

Therefore, for vector input, it can define the generalized Hermite polynomial as

By using generalized Hermite polynomial, this paper defines generalizedth order Hermite Dirichlet kernel [12] as

It can evaluate and verify the Mercer Theorem for as follows by assuming that each element is independent from others:

Therefore, is a valid KELM kernel, and its kernel parameters can only be natural numbers, which greatly simplifies the selection and optimization of kernel parameters.

3.3. Generalized Triangular Hermite Kernel Extreme Learning Machine

According to Property 1, a new KELM kernel called Generalized Triangular Hermite kernel is constructed, which is the multiplication of triangular kernel and generalized Hermite Dirichlet kernel, which is defined as

Generalized Triangular Hermite kernel combines the advantages of triangular kernel and generalized Hermite Dirichlet kernel, which not only retains more distance similarity information of sample data, but also just chooses natural numbers to its parameter. Accordingly, it can greatly shorten the time of parameter optimization. Although it has two kernel parameters, that can be determined quickly and easily, which greatly reduces parameter optimization cost. The Generalized Triangular Hermite kernel function up to the 3rd order is listed as in Table 1.

Figures 4 and 5 show the Generalized Triangular Hermite kernel output up to the 3rd order at two different coordinate scales of vertical axis, where changes are within the range of and are fixed at a constant value. Figure 4 shows the kernel function, while Figure 5 shows the value, where the 0th and 1st orders correspond to the left vertical axis, and the 2nd and 3rd orders correspond to the right vertical axis.

Finally, the kernel to KELM algorithm is introduced; as a result, the Generalized Triangular Hermite kernel extreme learning machine algorithm is obtained as follows.

Given a training set , the output function iswhere .

4. Experiments and Analysis

In order to test the performance of Generalized Triangular Hermite kernel extreme learning machine (Tri-H KELM) algorithms, this section compares it concerning testing accuracy, training time, and regression determination coefficient with other various algorithms in bispiral dataset and real-world benchmark regression, binary, multiclass classification datasets. Table 2 lists the various algorithms used in the experiments and the range of corresponding kernel parameter value, which also includes Gaussian kernel (Gauss), polynomial kernel (Poly), Gaussian Hermite kernel (Gau-H) [15] extreme learning machine algorithms, and triangular Hermite kernel support vector machine algorithm (Tri-H SVM). In multiclass simulations, we use the LIBSVM toolbox available and train SVM for each class separately as one versus all. The SVM cost parameter value is 100 in each corresponding experiment. Better test results are given in boldface in Tables 29.

4.1. Classification Performance Comparison on Banana Dataset

The Banana dataset is a well-known binary class dataset from the UCI benchmark repository used in many pattern recognition tests. Each vector of the dataset has 2 features. The training set consists of 800 randomly selected samples (400 for (+1) class and 400 for (−1) class); meanwhile, 400 samples are randomly chosen as a testing set, which contains 200 samples of each class. At each trial of simulation, regularization coefficient is assumed. The simulation results, the maximum testing accuracy, and the corresponding kernel parameter optimization time are given in Table 3.

Figures 69 show the boundaries of different KELM algorithms on benchmark Banana dataset.

Seen from Table 3 and Figures 69, Tri-H KELM achieves the highest testing accuracy in Banana dataset compared with Gauss, Poly, and Gau-H KELM. Furthermore, for the process of parameter optimization, the kernel parameter of two types of Hermite KELM algorithms is only taken from 0 to 3, which greatly shortens the time.

4.2. Classification Performance Comparison on UCI Benchmark Datasets

In order to extensively verify the performance of different algorithms, wide types of datasets have been tested in simulations, which include 5 binary classification cases and 6 multiclass classification cases. Most of the datasets are taken from UCI Machine Learning Repository. The specifications of binary classification and multiclass classification cases are listed in Tables 4 and 5. Comparing Tri-H KELM with Gauss, Poly, and Gau-H KELM and Tri-H SVM, experiment results, including the maximum testing accuracy and corresponding kernel parameter value, and the training time are given in Tables 6 and 7.

(1) Classification Performance Comparison on Binary Classification Cases. As demonstrated by the simulation results, Tri-H KELM achieves much better generalization performance in binary classification cases than other KELM algorithms (Gauss, Poly, and Gau-H KELM). More importantly, Tri-H KELM has achieved the highest classification accuracy just within a smaller range of integer numbers of kernel parameter, which greatly reduces the parameter optimization time. Besides, it is observed that Tri-H KELM can always achieve comparable performance with Tri-H SVM at much faster learning speed.

(2) Classification Performance Comparison on Multiclass Classification Cases. The experiment results infer that Tri-H KELM performs better than other KELM algorithms and Tri-H SVM on average for multiclass classification cases. Its biggest advantage is that the parameter selection time is short, because the best classification performance is achieved with given kernel parameters. A further study reveals that all KELM algorithms have better scalability and run at much faster learning speed than traditional SVM.

4.3. Regression Performance Comparison on UCI Benchmark Datasets

This subsection selects 5 regression cases from UCI Machine Learning Repository; data is described in Table 8. The simulations of different algorithms on all the regression datasets were carried out. The regression performance of ELM was evaluated by the coefficient of determination which is defined within the interval of and is closer to 1. The simulations results and the training time) are given in Table 9.

As Table 9 shows, in comparison to other several KELM algorithms, Tri-H KELM obtains the maximum coefficient of determination in most regression datasets, and it has all achieved the maximum with respect to kernel parameter. It has better generalization performance for regression problem. Note that Tri-H KELM achieves similar regression performance as SVM at much faster learning speeds.

5. Conclusion

In this work, a novel extreme learning machine based on the mixed kernel function of triangular kernel and generalized Hermite Dirichlet kernel (Tri-H KELM) has been put forward, which introduces the triangular Hermite kernel function to kernel extreme learning machine algorithm. Because the presented kernel has only one parameter chosen from a small set of integers, the parameter optimization is facilitated greatly. Besides, more structure information of sample data is retained in the proposed kernel. Numerical experiments have been performed with different algorithms (Tri-H SVM, Gauss, Poly, Gau-H, and Tri-H KELM) on bispiral benchmark dataset and a number of real-world benchmark datasets and their results have been compared with Tri-H SVM and Gauss, Poly, and Gau-KELM for regression and binary, multiclass classification. Comparable generalization and robustness performance of the proposed approach with the rest of the methods considered at a much faster learning speed than Tri-H SVM indicate its usefulness and effectiveness. Future work will be on the study of Tri-H ELM in its practical applications.

Competing Interests

The authors declare that they have no competing interests.