Complexity

Volume 2019, Article ID 1806314, 18 pages

https://doi.org/10.1155/2019/1806314

## Globality-Locality Preserving Maximum Variance Extreme Learning Machine

^{1}Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China^{2}School of Computer and Information Technology, Liaoning Normal University, Dalian 116081, China

Correspondence should be addressed to Hongfei Lin; nc.ude.tuld@nilfh

Received 5 December 2018; Revised 26 February 2019; Accepted 1 April 2019; Published 2 May 2019

Academic Editor: Michele Scarpiniti

Copyright © 2019 Yonghe Chu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

An extreme learning machine (ELM) is a useful technique for machine learning; however, the existing extreme learning machine methods cannot exploit the geometric structure information or discriminate information of the data space well. Therefore, we propose a globality-locality preserving maximum variance extreme learning machine (GLELM) based on manifold learning. Based on the characteristics of the traditional ELM method, GLELM introduces the basic principles of linear discriminant analysis (LDA) and local preservation projection (LPP) into ELM, fully taking account of the discriminant information contained in the sample. This method can preserve the global and local manifold structures of data to optimize the projection direction of the classifier. Experiments on several widely used image databases and UCI datasets validate the performance of GLELM. The experimental results show that the proposed model achieves promising results compared to several state-of-the-art ELM algorithms.

#### 1. Introduction

Single-layer feedforward networks (SLFNs) have been intensively studied over the past several decades. The well-known algorithm in single layer feedforward networks is the backpropagation (BP) algorithm proposed by Rumelhar et al. [1] in 1986. The BP algorithm uses the idea of gradient descent to optimize the parameters in the neural network, but this optimization has the disadvantages of slow training speed, and it easily falls into a local minimum. Therefore, researchers have proposed different improved algorithms for the problem of slow training speed which easily falls into a local minimum. Hagan et al. [2] proposed a second-order optimization method in 1994. Branke et al. [3] proposed a global optimization method in 1995. Li et al. [4] proposed a subset selection method in 2005.

Recently, the extreme learning machine (ELM) [5] has attracted increasing attention from scholars. ELM is developed on the basis of single-hidden layer feedforward networks (SLFNs) and can be regarded as an extension of SLFNs. In traditional neural network algorithms, for example, the BP[1] neural network uses the gradient descent-based method to adjust the input weight and basis value of the hidden layer nodes in an iterative manner. However, the method based on gradient descent has the disadvantages of slow solution speed and easily falling into a locally optimal solution. Compared with the traditional neural network algorithm, ELM randomly generates the input weight and basis value of the hidden layer node, so it has a faster solution speed and requires less human intervention during the training process. The literature [6, 7] analysed the input weight and bias value of ELM at randomly generated hidden layer nodes to determine the output weight, which maintained the general approximation ability of SLFNs. At the same time, a near-global optimal solution can be obtained. The literature [8, 9] notes that ELM has better classification performance than support vector machine (SVM) [10]. Due to the good generalization ability of ELM, ELM has been widely used in pattern recognition [11–15].

In recent years, researchers have studied ELM in various ways and proposed various improvements. Huang et al. [6] further studied the general approximation ability of ELM. Lin et al. and Liu et al. [16–18] used statistical learning theory to conduct in-depth research on the generalization ability of ELM. Wang [19] et al. proposed a local generalization error model for the problem of ELM generalization ability, and the researchers also compared ELM with other classification algorithms. Shi et al. [20] studied ELM and SVM and their improved algorithms in depth and concluded that ELM is superior to SVM in training speed and generalization ability. Many variants of ELM have been proposed to meet particular application requirements. For example, Wang et al. [21] analysed the influence of the hidden layer node output matrix on the ELM algorithm and proposed an improved algorithm. Zheng et al., Riccardo et al., and Zhang et al. [22–24] proposed various improvements to the ELM algorithm by analysing the influence of data on the ELM model from the perspective of cost sensitivity coefficients. Li [25] et al. studied the defects of ELM in unbalanced data and missing data to improve the ELM algorithm. Zhou et al. and Javier [26–28] et al. applied ELM to remote sensing images. Zhou et al. [29] proposed various improvements for ELM to solve the problems in online continuous data applications. Recently, researchers have combined ELM and dimensionality reduction techniques for application. Castaño [30] et al. applied principal component analysis (PCA) dimensionality reduction techniques to ELM, and Wang et al. [31] combined the local tangent space alignment (LTSA) dimensionality reduction algorithm with ELM. Researchers have also applied integration techniques to ELM to improve the robustness of ELM algorithms. Zhang et al. [32] applied AdaBoost technology to ELM, and Liu et al. [33] proposed an integrated extreme learning machine. Deepak et al. [34] applied bagging technology to the ELM algorithm.

The above improvements in theory and application enhance the generalization capability of ELM and greatly expand the application range of the ELM algorithm. However, the discriminant information of the ELM algorithm on the data samples and the global and local manifold structures between the data samples have not yet been carefully studied in mathematics or geometry. Recently, researchers have noted that manifold learning methods [35, 36] can effectively reveal the intrinsic geometry of data points [9]. Assuming that data samples and are drawn from the same marginal distribution , if two points and are close to each other, then the conditional probabilities and should be similar as well. The above assumptions are widely referred to as smoothing assumptions in machine learning. Therefore, by mining the geometry between the data, it is possible to provide effective information for pattern classification. Recently, the researchers carried on the thorough research on manifold learning, puts forward the different methods to keep local characteristics of data [37–39]. Aiming to solve the drawback of ELM that the intrinsic manifold structure of the data space is ignored, and inspired by manifold learning and literature [40], we introduce the basic principles of linear discriminant analysis (LDA) [41] and locality preserving projections (LPP) [42] into ELM, proposing a novel learning algorithm called the globality-locality preserving maximum extreme learning machine (GLELM) in which the manifold structure within each class is explicitly considered. This method introduces the intraclass divergence and interclass divergence matrix in LDA and the basic principle of LPP into ELM so that it not only maintains the intrinsic local geometry of the sample but also maintains the global geometric structure of the sample to a certain extent and embodies the global discriminant information contained in the sample. GLELM retains the locality preserving characteristic of LPP and utilizes the global discriminative structures obtained from MMC, which can maximize the between-class distance and minimize the within-class distance. We combine the thought of LPP and the principle of LDA into ELM model, to enhance the information discriminant ability of ELM. So GLELM is superior to ELM for recognition task. Moreover, the experimental results show that the intrinsic manifold structure of the data sample can effectively improve the classification performance of the ELM algorithm. In addition, the literature [43] noted that some recent research shows that the images will reside on a nonlinear submanifold. Therefore, in this case, GLELM can usually achieve better performance than ELM. The contributions of the GLELM algorithm mentioned in this paper are as follows.

(1) While inheriting the characteristics of ELM, GLELM avoids the problem of insufficient learning to some extent.

(2) The basic principles of LDA and LPP are introduced into ELM, which effectively maintains the intrinsic local geometry and global geometry of the sample and introduces the global discriminant information of the data samples into the ELM model.

(3) The idea of manifold learning is applied to the ELM model, and the validity of the GLELM algorithm is verified by experiments.

The rest of the paper is organized as follows. In Section 2, this paper introduces related work. In Section 3, we introduce the basic principles and framework of the ELM algorithm. Section 4 presents the GLELM algorithm framework. Section 5 describes and analyses the experimental results. Section 6 summarizes the paper.

#### 2. Background and Notation

##### 2.1. Notations

Given datasets , is data dimension, is the number of samples, is the total number of categories for datasets. The dataset label vector is . Define the projection transformation matrix as . The data are reduced from the original space to the low-dimensional subspace . The symbols express the -norm. denotes the trace operator; denotes the nearest neighbours operator.

##### 2.2. Related Works

In this section, we briefly review the related work. Iosifidis et al. used the principle of linear discriminant analysis to explore the geometric structure of the data and introduced the intraclass divergence matrix and the global divergence matrix into the ELM model and proposed a minimum class variance extreme learning machine (MCVELM) [44] and a minimum variance extreme learning machine (MVELM) [45], respectively. On this basis, Iosifidis et al. proposed a graph embedding extreme learning machine (GEELM) [46] to optimize the network output weights of ELM. The GEELM provides a unified way to incorporate subspace learning criteria formulated using graphs in elm optimization. In their paper, formulations using supervised and unsupervised subspace criteria in elm optimization are used. Liu et al. proposed the robust discriminative extreme learning machine (RDELM) [47] for the deficiency of the MCVELM algorithm for discriminating information between data samples. The RDELM algorithm not only takes into account the intraclass discrimination information of the data samples but also considers the interclass discrimination information of the data samples. The motivation for our paper is similar to the above papers, which also discussed the geometry of ELM. However, they directly used the geometric structure information of the data to optimize the network output weight. We focus on the data samples and that are drawn from the same marginal distribution . If two points and are close to each other, then the conditional probabilities and should be similar as well; therefore, the manifold structure information of the data samples is introduced into the ELM model, and the generalization ability of the ELM algorithm is enhanced.

The most relevant work was proposed by literature [48–50]. Iosifidis et al. introduced local class information into the ELM model and proposed a Local Class Variance Extreme Learning Machine (LCVELM) classifier [48]. Based on the consistency property of data, which enforces similar samples to share similar properties, Peng et al. proposed a discriminative graph regularized extreme learning machine (GELM) [49]. GELM constructs the Laplacian Eigenmap (LE) [51] structure with discriminant information of data samples and introduces it into the ELM algorithm as a regular term. In addition, Peng et al. proposed a discriminative manifold extreme learning machine (DMELM) [50] based on local intraclass discriminant information, local interclass discriminant information, and data geometric structure information. The GELM and DMELM algorithms proposed by Peng et al. enhance the classification performance and generalization ability of the ELM model by introducing the manifold structure and discriminant information of the data samples into the ELM model. However, the GELM and DMELM algorithms ignore the global geometry and discriminant information of the data samples. The literature [52] shows that the intraclass divergence matrix, interclass divergence matrix, and global divergence matrix in linear discriminant analysis (LDA) maintain the global discriminant information and global geometric structure of the training samples. Therefore, based on the basic principles of the LDA and LPP algorithms, we introduce the global and local manifold structure and discriminant information into the ELM model and propose the GLELM model.

It is clear that our GLELM models are the natural extension of ELM with the manifold regularization, and the manifold learning methods have also been combined with other machine learning algorithms, such as globality–locality preserving projections (GLPP) [53, 54] and support vector machine with globality-locality preserving (GLPSVM) [55]; GLPP separates the data into a static part (subject-invariant factors) and a dynamic part (intrasubject factors) at first and then jointly learns these two graph Laplacians to yield a new graph Laplacian. GLPP realize dimensionality reduction for data by using the aforementioned method. By using LPP to keep local geometry information and LDA to keep global geometry information of data, GLELM unifies LPP and LDA into a manifold regularization framework. The proposed GLELM algorithm combines manifold criterion and Fisher criterion, with a stronger discriminative ability. GLPSVM introduced manifold structure information into SVM, using geometry and discriminative information to construct manifold regularization framework. Both GLPSVM and GLELM use LPP to construct manifold framework; however, GLPSVM uses data sample mean vector to obtain the global geometric structure information of data while GLELM uses LDA. In addition, the architecture of GLELM is completely different from the GLPP and GLPSVM. In fact, GLPP is a dimensionality reduction algorithm. As a classification algorithm GLPSVM do the classification by maximizing the geometric intervals. Based on single hidden layer feedforward neural network, GLELM randomly generate output weights and hidden layer offset value and analyse and determine the weights of the output so as to realize the data classification. Different architecture leads to different recognition performance.

##### 2.3. Extreme Learning Machine

The extreme learning machine proposed by Huang et al. [5] is an efficient and practical learning mechanism for single-layer feedforward neural networks. For different samples , where and , the ELM model with hidden layer node activation function is as follows:where is the input weight vector connecting theth hidden layer node with the input nodes; is the output weight vector connecting the th hidden layer node and the output node. is the offset value of the th hidden layer node. represents the inner product of and . is the network output corresponding to sample . To integrate all data samples, (1) can be rewritten as follows:where is the output vector of the hidden layer with respect to , is the network hidden layer node output, is the output weight matrix, and is the expected output matrix:The standard single hidden feedforward neural networks (SLFNs) are to compute appropriate , , and to satisfyFormula (5) can be solved by gradient descent method. Huang et al. [22] have proved that the weights between input layer and the biases need no adjustment compared with the standard SLFNs. In the algorithm of ELM, weights and bias values of hidden layer nodes are randomly input; the single hidden layer feedforward neural network nonlinear model is converted into linear model. Formula (5) can be written as and can be solved by using least square method. When the number of hidden layer nodes is the same as the number of training samples (), we can directly obtain the optimal output weight matrix by the inverse matrix of matrix by (5). However, in most cases, the number of hidden layer nodes is much smaller than the number of training samples (). At this time, the matrix is a singular matrix. We solve (5) by the least squares solution:where is the generalized inverse matrix of the matrix and can be calculated by SVD or least-squares.

To improve the stability and generalization capability of traditional ELM, Huang [22] proposed the equality optimization constraint-based ELM. The optimization formula of the ELM of the equality optimization constraint not only minimizes the training error but also minimizes the output weight , so the ELM target of the equality optimization constraint can be written asIn (7), is a training error vector corresponding to the sample , and is a penalty parameter.

The number of training samples is larger than the number of hidden layer nodes, or the number of training samples is smaller than the number of hidden layer nodes in the calculation process of ELM. The two cases corresponding to the output weight are different. We will rewrite them as follows:When the number of training samples is less than the number of hidden layer nodes (), the solution to (8) is When the number of training samples is greater than the number of hidden layer nodes (), the solution to (10) isThe ELM algorithm solving process can be summarized as follows:

(1) Initialize the training sample set

(2) Randomly specify the network input weight and the offset value ,

(3) Calculate the hidden layer node output matrix by the activation function

(4) Calculate the output weight matrix according to (9) or (10)

##### 2.4. Linear Discriminant Analysis

The main idea of LDA is to enhance the global class discrimination after projection, which maximizes the rank of the inter-class discrete matrix by minimizing the rank of the intraclass discrete matrix to find a subspace to distinguish different categories. According to the derivation of LDA in the literature [48], and are defined as follows:In (11) and (12), is the number of samples in the th class, and is the th sample in the th class. is the mean vector of the th class, represents the mean vector of all samples, and is the total number of categories in the dataset. LDA has the following optimization criteria:Equation (13) finds the projection transformation matrix by the Lagrange multiplier method and then obtains the corresponding low-dimensional expression of via .

##### 2.5. Locality Preserving Projections

As a linear transformation of the LE algorithm, the LPP algorithm solves the difficulty that the LE algorithm has in obtaining low-dimensional projection mapping on new test data [51] and is easily embedded by nonlinearity, thus finding a high-dimensional nonlinear manifold structure. LPP achieves dimensionality reduction by maintaining the neighbourhood structure of the data samples. LPP is obtained by linear transformation on the basis of the LE algorithm. The LPP model can be expressed as follows:In formula (14), is the projection transformation matrix. is the identity matrix, represents the Laplacian matrix, where is the diagonalization matrix, and . is the sparse affinity matrix; if and are not near neighbours, then . If and are near neighbours, then . By learning a projection , the objective function minimizes the distance between those data points with neighbourhood relation in the raw data space.

#### 3. Globality-Locality Preserving Maximum Variance ELM

##### 3.1. Motivation of Globality-Locality Preserving Maximum Variance ELM

The local geometry of the sample can be used as side information for improving the performance of learning models. Assuming data samples and are drawn from the same marginal distribution , if two points and are close to each other, then the conditional probabilities and should be similar as well. Based on local geometry of the sample, many locality preserving methods were proposed [56]. Zhao et al. proposed a new and effective semisupervised dimensionality reduction method, called Learning from Local and Global Information (LLGDI) [56], to utilize the underlying discriminative information. Literature [37] solves the problem that traditional subspace learning methods are the sensitivity to the outliers. They proposed a series of methods based on the L2,1-norm for dimensionality reduction. Literature [38] studies the problem that ridge regression based methods are sensitive to the variations of data and can learn only limited number of projections for feature extraction and recognition. They propose a new method called robust discriminant regression (RDR) for feature extraction. In literature [39], LLE and ONPP are combined to form the framework of sparse subspace learning. The framework is not only suitable for sparse linear subspace learning but also suitable for sparse nonlinear subspace learning. Essentially, our method can be viewed as one type of manifold learning, which is aimed at preserving the local geometry structure during feature learning or classification.

##### 3.2. Manifold Regularization Framework

Manifold regularization framework can be obtained based on the LE algorithm [51]. However, because the LE algorithm has difficulty obtaining the low-dimensional projection mapping problem on the new test data [42], the LPP algorithm solves the above problems of the LE algorithm. Inspired by literature [40], based on the LPP algorithm, this paper proposes a manifold regularization framework. At the same time, considering that the LPP algorithm cannot maintain the global geometry of the data samples and the discriminant information contained in the data, this paper introduces the basic principles of the LDA algorithm into the manifold regularization framework. Compared with the literature [49, 50], the advantages of the algorithm proposed in this paper are as follows: (1) not only is the local manifold structure considered but also the global manifold structure and the global discriminant information of the data samples are considered; (2) taking into account the singularity of the manifold regularization framework, the maximum marginal criterion (MMC) [57] is used to solve the above problem. The LDA algorithm will make the similar samples closer but heterogeneous samples far away after the projection transformation. The LPP algorithm has advantage of maintaining the neighborhood structure of the sample after projection transformation. Therefore, the combined Section 2.4 and Section 2.5 manifold regularization framework loss function is shown in (14):where is an intraclass discrete matrix and is an interclass discrete matrix as described in Section 2.4. is the projection transformation matrix and is the unit matrix. represents the Laplacian matrix and is a diagonalization matrix as described in Section 2.5.

##### 3.3. GLELM

The existing ELM algorithm cannot make good use of the intrinsic manifold structure information of the data, which can create the problem of insufficient learning. To overcome this problem, we propose a globality-locality preserving maximum extreme learning machine (GLELM) based on manifold learning. The optimization problem formulation of the GLELM is given by using the manifold regularization framework.

Based on manifold learning [9], assuming data samples and are drawn from the same marginal distribution , if two points and are close to each other, then the conditional probabilities and should be similar as well. The above assumption is widely known as the smoothness assumption in machine learning. In this subsection, we introduce the manifold regularization framework into the ELM model. In the ELM algorithm, ; therefore, the GELM algorithm model can be written as follows:where is the manifold regularization term, as described in Section 3.2. is the -norm regularization term; is the training error term. is a penalty constant on the training errors, and is a penalty constant on the manifold regularization term. is the output vector of the hidden layer with respect to , as described in Section 2.3.

We rewrite (16) to the following form:By substituting (17) in and solving for ,Order ; according to formula (18), we can obtain the following formula:The output weight matrix is obtained by solving formula (19) as follows: Based on the above derivation, the specific steps of the GELLM algorithm are as shown in Algorithm 1.