#### Abstract

Extreme learning machine is a fast learning algorithm for single hidden layer feedforward neural network. However, an improper number of hidden neurons and random parameters have a great effect on the performance of the extreme learning machine. In order to select a suitable number of hidden neurons, this paper proposes a novel hybrid learning based on a two-step process. First, the parameters of hidden layer are adjusted by a self-organized learning algorithm. Next, the weights matrix of the output layer is determined using the Moore–Penrose inverse method. Nine classification datasets are considered to demonstrate the efficiency of the proposed approach compared with original extreme learning machine, Tikhonov regularization optimally pruned extreme learning machine, and backpropagation algorithms. The results show that the proposed method is fast and produces better accuracy and generalization performances.

#### 1. Introduction

The extreme learning machine (ELM) is a very important supervised machine learning algorithm proposed for training single hidden layer feedforward neural network (SLFN), which have been successfully used in many engineering disciplines [1–8], etc. One of the main drawbacks of ELM is the selection of the optimal number of hidden nodes, the random choose of the input parameters, and the type of the activation functions. These disadvantages directly affect the performances of neural network [9, 10]. Therefore, in order to enhance the performance of SLFN, several algorithms have been developed for optimizing ELM hidden nodes [11–23]. In [11], the authors proposed a new kind of ELM, named self-adaptive extreme learning machine (SaELM), in which optimal hidden neurons number are selected to construct the neural network. In [12], Huang et al. proposed an incremental extreme learning machine, named (I-ELM), which randomly adds hidden neurons incrementally and analytically determines the output weights. In [13], Huang and Chen proposed an improved version for (I-ELM) called enhanced random search-based incremental algorithm (EI-ELM), which choose the hidden neurons that lead to the smallest residual error at each learning step. A further improvement about (I-ELM) is made in convex incremental extreme learning machine (CI-ELM) [14]. Its output weights are updated after a new hidden neuron is added. In [15], an effective learning algorithm, known as self-adaptive evolutionary extreme learning machine, is presented to adjust the ELM input parameters adaptively, which improves the generalization performance of ELM. An improved evolutionary extreme learning machine based on particle swarm optimization was proposed to find the optimal input weights and hidden biases [16]. Error minimized extreme learning machine (EM-ELM) [17] randomly adds neurons to the hidden layer one by one or group by group and updates output weights recursively. Pruned-ELM [18], named as P-ELM, was presented to determine the number of hidden neurons using statistical methods. In [19], Miche et al. considered the optimally pruned extreme learning machine (OP-ELM), in which the hidden neurons are ranked using multiresponse sparse regression algorithm, and then the selection for the best number of neurons is taken by a leave-one-out validation method. In [20], a constructive hidden neuron selection ELM (CS-ELM) was proposed, where the hidden neurons are selected according to some criteria. The work in [21] used ELM with adaptive growth of hidden neurons (AG-ELM) to automate the design of networks. In [22], by combining Bayesian models and ELM, the Bayesian ELM (BELM) is proposed to optimize the weights of the output layer using probability distribution. In [23], Miche et al. proposed a double regularized ELM using a least-angle regression (LARS) and Tikhonov regularization (TROP-ELM). Bidirectional extreme learning machine (B-ELM) was presented in [24], in which some hidden neurons are not randomly selected. In [25], Cao et al. proposed an enhanced bidirectional extreme learning machine (EB-ELM), in which some hidden neurons are randomly generated and only the neurons with the largest residual error are added to the existing network. Online sequential learning mode based on ELM (OS-ELM) was presented in [26]. Fuzziness based OS-ELM was presented in [27]. In [28], a dynamic forgetting factor is utilised to adjust OS-ELM parameters, and the corresponding DOS-ELM algorithm is proposed. Up to now, many other algorithms have been considered to extend the basic ELM to make it more efficient [29–35].

Motivated by developing a fast and efficient training algorithm for SLFN, this paper presents a new hybrid approach for training SLFN, where the weights between the input layer and the hidden layer are optimized by a self-organizing map algorithm [36], and the output weights are calculated using the Moore–Penrose generalized inverse like in ELM [1]. The efficiency in terms of classification accuracy and computation time of the proposed method is shown by the simulation results of different classification problems. The main contributions of our work can be summarized as follows:(1)We propose a hybrid algorithm combining the self-organizing map algorithm with extreme learning machine algorithm for optimizing SLFN weights. In this algorithm, the self-organizing map is first used to optimize the weights connecting the input and hidden layers. Then, the ELM is applied to determine the weights connecting the hidden and output layers. The main objective of the proposed approach is to achieve a higher solution accuracy and faster convergence with a compact network size.(2)Comparing with various methods, we evaluate the performance of our algorithm in terms of classification accuracy and convergence speed over different types of datasets.

The remainder of this paper is as follows. In Section 2, we recall the preliminary of ELM. Section 3 provides a detailed description of the hybrid learning algorithm. In Section 4, simulation results and comparisons with BP algorithm, basic ELM, and TROP-ELM are given. Finally, the conclusion is drawn in Section 5.

#### 2. Basic ELM Algorithm

Recently, an efficient learning algorithm, called extreme learning machine (ELM), for single hidden layer feedforward neural network (SLFN) has been proposed by Huang et al. [1]. In ELM, the input weights of the hidden nodes are randomly chosen, and the output weights of SLFN are then computed by using the pseudoinverse operation of the hidden layer output matrix. The illustration of single hidden layer feedforward neural network is given in Figure 1. The numbers of neurons for input, hidden, and output layers are *n*, , and *m*, respectively.

Given *N* training samples , where and . The output of an SLFN can be represented by:where is the weight vector connecting the hidden node and the input nodes.

In general, the total weight matrix W iswhere is the weight vector connecting the hidden node and the output nodes, is the threshold of the node, is the output vector of neural network, and denotes an activation function, in general,

Equation (1) can be written compactly aswhere *H* is the output matrix of the hidden layer and defined as follows:

The criterion function to be minimized is the sum of the squared errors over all the training samples, given by

The output weight matrix can be determined analytically by minimizing the least square error:

A solution of the linear system (7), , can be computed as follows:where is called the Moore–Penrose generalized inverse of matrix *H* and *T* is the desired output matrix, expressed as

The ELM algorithm can be summarized as follows:

Step 1.Randomly assign the input weight and biases ,*i*∈ [1,].Step 2.Calculate the hidden layer output matrix

*H*using equation (4).Step 3.Calculate the output weight matrix by equation (8).

#### 3. Proposed Learning Algorithm

In this study, the architecture of the proposed single hidden layer feedforward neural network (SLFN) is shown in Figure 2.

It is composed of an input layer, one-dimensional Kohonen layer, and an output layer. To ensure the superiority of the proposed network structure, an appropriate hybrid learning algorithm for training a SLFN is presented. This algorithm is the fusion of a self-organizing map [36] and extreme learning machine [1]. During training with this algorithm, the network operates in a two-stage sequence. The weights of hidden layer are clustered by SOM in the first stage. In the second stage, ELM is initialized with the weights obtained in the previous stage. The sketch map of the proposed method is shown in Figure 3.

The learning algorithm can be described as follows.

##### 3.1. Stage 1: SOM-Based Initialization

Self-organizing map (SOM) is an unsupervised learning method to represent high-dimensional data vectors into a regular low-dimensional map by grouping similar input vectors and form a number of clusters. In our work, the basic SOM network consists of two layers, an input layer and a one-dimensional Kohonen layer in which neurons are arranged into a one-dimensional map. Each neuron *i* on the map is presented by *n*-dimensional weight vector , where *n* is the dimension of the input vector **x**. The steps of SOM learning algorithm are as follows:

*d*of the winning neuron and is the learning rate.Step 4.If all input data are presented to the network, go to Step 5; otherwise, go to Step 2.

##### 3.2. Stage 2: ELM with Subset of Neurons

In the first stage, SOM is used to reduce the dimension of input weights matrix *W* of ELM from to .

**x**:Step 8.Calculate the weights between the hidden layer and the output layer:where is the new weight vector connecting the hidden node and the output layer.

#### 4. Simulation Results

In this section, simulation results are presented and discussed in order to evaluate the performance of the proposed algorithm and to compare it with the conventional BP algorithm, basic ELM, and TROP-ELM through a classification problem. Our method has been tried on nine datasets; the first eight datasets are from the UCI Machine Learning Repository. The ninth dataset “Jaffe” is composed of images and provided by the Psychology Department in Kyushu University. The algorithms were tested on a computer with the Core-i5 processor, 8 GB RAM, 2.4 GHz CPU, MATLAB R2018a.

##### 4.1. Datasets Description

There are many benchmarks for classifications, and we have selected nine classification datasets that are summarized in Table 1. The description of the datasets is as follows: Dataset 1: ionosphere is a type of dataset used for binary classification. The main objective is to determine the type of a given signal (good or bad) by referring to free electrons in the ionosphere. It has 351 instances divided into two classes with 34 integer and real attributes. Dataset 2: Iris is the most popular and the best-known dataset for classification and recognition of models based on the examination of the size of petals and sepals of the plant. It contains in totality 150 instances, which are equally separated between three classes. Each instance is characterized by four real attributes. Dataset 3: the wine dataset is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. It shows the existence of 178 instances and 13 continuous attributes. Dataset 4: the balance dataset is generated to model psychological experimental results. Four categorical attributes can indicate the balance scale of the 625 instances that are divided into three classes. Dataset 5: it is a simple dataset that consists of 101 animals from a Zoo. This dataset is able to predict the seven class of animals based on the 16 Boolean attributes. Dataset 6: this dataset includes 2310 instances divided into 7 classes that are handsegmented to create a classification for every pixel. Image data are described by 19 attributes. Dataset 7: the objective of the Ecoli dataset is to predict the localization of proteins by using measurements on the cell. It has 336 instances which are identified by seven attributes and divided into eight classes in unbalanced way. Dataset 8: the multiple features dataset aims to classify the handwritten numerals. It has in totality 2000 instances that are equally separated between 10 classes with 649 attributes. Dataset 9: the Jaffe dataset is composed of 213 grayscale images sized of 256∗256 and posed by 10 Japanese female models. Each female has two to four examples for each expression. The objective is to predict for each image one of the seven facial expressions such as angry, disgust, fear, happy, neutral, sad, and surprised. One emotion of the seven different facial expressions from the Jaffe dataset is shown in Figure 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

For all datasets, 70% of the data are chosen for training phase while the remaining are reserved for testing. Three performance metrics have been listed in Table 2 in which accuracy value is calculated as follows:where TP is the number of elements correctly classified as positive, FP is the number of positive elements incorrectly classified, FN is the number of negative elements incorrectly classified, and TN is the number of true elements correctly classified as negative.

##### 4.2. Results and Discussion

The performance of the current ELM method is dependent on the initial input weights and biases which are randomly initialized. In an attempt to overcome this problem, the heuristic approach explained above is used to automatically determine the optimal number of hidden neurons based on the clustering method. Different from basic ELM with hidden neurons, our method generally needs less hidden neurons and . The comparison results given in Table 2 clearly indicate that our approach reduces the number of hidden neurons compared with the standard ELM and TROP-ELM for all cases. In addition, it should also be noted that the proposed approach outperforms the standard ELM, TROP-ELM, and backpropagation algorithms in terms of training time. A Box and Whiskers plot illustrations of the compared methods is shown in Figure 5. It can be clearly seen from Table 2 and Figure 5 that the accuracy of the results of the proposed algorithm is indeed higher than that of backpropagation, ELM, and TROP-ELM algorithms. All these results indicate that the hybrid algorithm can optimize the network structure to a suitable size with fewer hidden nodes and yet be able to classify the datasets with a better accuracy.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

#### 5. Conclusion

This paper proposed a novel hybrid algorithm for single hidden layer feedforward neural network. This algorithm consists of the use of a self-organizing map algorithm coupled with extreme learning machine. The learning process of this method includes two steps. The first step is to train the weights connecting the input and the hidden layers by a self-organizing map algorithm, and the second step is to use the Moore–Penrose inverse method to calculate the weights connecting the hidden and output layers. In order to prove the performance of the hybrid approach, it is used to solve several popular classification problems. A comparison with other basic methods such as BP, ELM, and TROP-ELM confirms the superiority of this method in terms of generalization performance and faster learning speed. The main disadvantage of the proposed method is that it uses a fixed structure of self-organizing map, where the number of neurons and the size of neighbourhood function must be determined before clustering. This often leads to significant limitation for most application. In future work, we will consider extending the study of the proposed method in the image classification domain. Another direction of future research includes the study of the proposed approach with different types of self-organizing maps and a wide range of activation functions.

#### Data Availability

The data used to support the findings of this study have been deposited in the UCI Machine Learning Repository and the Psychology Department in Kyushu University.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.