Abstract

Voting-based extreme learning machine (V-ELM) was proposed to improve learning efficiency where majority voting was employed. V-ELM assumes that all individual classifiers contribute equally to the decision ensemble. However, in many real-world scenarios, this assumption does not work well. In this paper, we aim to enhance V-ELM by introducing weights to distinguish the importance of each individual ELM classifier in decision making. Genetic algorithm is used for optimizing these weights. This evolutionary V-ELM is named as EV-ELM. Results on several benchmark databases show that EV-ELM achieves the highest classification accuracy compared with V-ELM and ELM.

1. Introduction

The extreme learning machine (ELM) [1, 2] was proposed as a new type of learning algorithm for single-hidden layer feedforward neural network (SLFN). ELM adopts a training mechanism that does not require parameter tuning so that it learns faster than traditional gradient-based neural networks while achieving even better classification/regression performance [3]. ELM has received increasing attentions in recent years; many efforts have been dedicated to improve its performance [47] and apply it in various applications [814]. Among these ELM extensions, ensemble learning-based methods [1523] have found their advantages such as better accuracy and less variance compared with the original ELM algorithm.

Ensemble learning has been popular for decades. Jain et al. [24] wrote a concise yet informative introduction to the classifier combination. Polikar [25] comprehensively reviewed the area of multiple classifier system (ensemble system) for decision making. Majority voting [26] is one of the most commonly used combining strategies. This rule seeks the class that receives the highest number of votes and assigns it to the predicted label for the testing pattern. Since each classifier in an ensemble does not necessarily contribute equally to the final decision, Littlestone and Warmuth [27] proposed weighing individual classifiers to make them discriminative. Ensemble learning-based ELM algorithms [16, 18, 19] were reported to successfully resolve the problems of predictive instability and overfitting.

Among ensemble-based extensions, voting-based ELM (V-ELM) [18] was proposed to perform multiple independent ELM training using a simple and effective learning architecture. A decision was made based on majority voting. V-ELM not only enhanced the classification performance, but also lowered the variance. In this paper, we aim to investigate the introduction of weights for all individual ELM classifiers to enhance the V-ELM algorithm. The hypothesis is that each individual ELM classifier presents various levels of confidence in decision making. In our proposed method, each weight represents the importance of an ELM classifier. Final decision is made with the weighted majority voting scheme.

The remainder of this paper is organized as follows. Section 2 briefly reviews ELM and V-ELM algorithms. Section 3 presents the proposed evolutionary V-ELM (EV-ELM) algorithm. Section 4 demonstrates the performance of EV-ELM and compares it with ELM and V-ELM. Section 5 concludes this study.

2. Background

2.1. Extreme Learning Machine

In the process of SLFN learning, ELM randomly selects weights and biases for hidden nodes. Then it analytically determines the output weights by finding the least squares solution. Given a training set consisting of samples , where is an input vector and is an target vector, an SLFN with hidden nodes is formulated as where the additive hidden node is employed. Weight vector connects the th hidden node and input neurons. In approximating samples using hidden nodes, , , and are supposed to exist if zero error is obtained. Consequently, (1) can be written as where is hidden layer output matrix of the network; is the output of th hidden neuron with respect to , and ; and are output weight matrix and target matrix, respectively.

The ELM algorithm can be summarized as three steps: (1) generate parameters and randomly for ; (2) calculate the hidden layer output matrix ; and (3) calculate the output weight using . It has been shown in [28] that any continuous target functions in can be universally approximated using single SLFN with randomly chosen additive hidden nodes.

2.2. Voting-Based Extreme Learning Machine

In ELM, randomized hidden nodes are used and remain unchanged during the training. Some testing samples could be misclassified in certain situations, for example, when they are near the classification boundary. To tackle this issue, V-ELM incorporates multiple individual ELMs and makes decisions with majority voting. V-ELM uses a fixed number of hidden nodes for all individual ELMs. All these ELMs are trained with the same dataset and the learning parameters of each ELM are randomly initialized. The predicted class label is then determined by majority voting on all results obtained from ELMs.

3. Evolutionary Voting-Based ELM

Given a learning set consisting of samples , , where is the class label. We assume that is the input and is predicted by . In V-ELM, the aim is to better predict using multiple ELMs than a single one. Suppose that predicts a class label and the prediction of th classifier is where , the ensemble decision can be defined as The voting is plurality version which means that the output is the value with highest number of votes whether or not the sum of votes exceeds half.

In many applications, not all the classifiers contribute equally to decision making. The overall performance of the ensemble system is able to be improved by weighing the decisions prior to combination [27]. In this section, an evolutionary voting-based ELM (EV-ELM) using weighted majority voting is proposed. The general algorithm is elaborated in Algorithm 1.

Inputs: Training samples with labels ; : Population
of candidates at generation ; : Population size of each generation; : Crossover probability;
: Mutation probability
Initialization: Set and initialize the population at random.
Evolutionary Process: Evaluate the fitness values of using (5). Increase with 1 in each
iteration. To create a new generation , operations of selection, crossover and mutation
on are used. Repeat the following steps until the termination criteria of genetic algorithm
(GA) is met.
(i) Firstly, members of are probabilistically selected to according to the
fitness.
(ii) Secondly, the crossover operator is applied to half of not selected candidates in . The
offsprings after crossover are added to .
(iii) Lastly, a number of chromosomes with a probability of in are subject to mutation.
Store weights as the outputs where “opt” indicates “optimal”.
Decision Making: Given a testing sample , use the following equation to predict its label
            

We denote as the weights for th individual ELM. The mathematical representation of weighted voting algorithm is shown as In the framework of weighted majority voting algorithm, the weight needs to be optimized to improve the generalization performance. If we know certain classifiers working better, we are able to assign larger weights to the corresponding ones. However, such knowledge is usually absent.

Conventional parameter updating methods find the optimal weights to provide better generalization performance. But the optimization process has the risk of getting the local minima and maxima. Methods that discover global optimum can be implemented to further improve classification accuracy. Genetic algorithm (GA) [29] is a class of optimization procedures inspired by the biological mechanisms of reproduction. Many applications utilize the advantages of GA to find optimal solutions, for example, face recognition [30] and clustering techniques [31]. GA is implemented in this paper for demonstration purpose. In practice, many new emerging techniques are potential alternatives such as differential evolution [32] and particle swarm optimization [33].

In order to use GA to select proper weights, the chromosomes are formed by . At the beginning, a population of chromosomes is generated randomly. Then, the fitness function is calculated for each chromosome as where and are the weight and training accuracy for the th ELM. Maximizing the fitness implies that by choosing appropriate weights we are able to achieve the best normalized training accuracy across all ELM classifiers in the ensemble. Such a set of optimal weights provide the decision ensemble good generalization performance on unseen testing samples. The fitness function is the most important measurement to determine the composition of the next generation and to guide the entire evolutionary process.

After chromosome selection, parts of the current population are inherited into the next generation. The remaining strings are reproduced and some of the parent chromosomes have undergone crossover operation. The crossover probability is defined as . To extend the search space to find global optimum, mutation is applied to some of offsprings randomly with a very small mutation probability . The mutation will introduce a degree of diversity to the population and prevent a premature convergence. The evolutionary process will be terminated after generations, which are considered enough for convergence.

4. Performance Evaluation

Experiments were carried out in MATLAB 7 environment under a desktop equipped with Intel 3.2 GHz CPU and 4 G RAM. The learning and testing processes were repeated 50 times and the mean and standard deviations were reported in results. In the experiments, the range of weight was where the exponential function was used to enhance the difference between lower-bound and upper-bound values of the weight. Moreover, the crossover probability and mutation probability were chosen as 0.65 and 0.004, respectively. The population size was 100 and the number of generations in GA was 150 for all experiments.

4.1. Databases

We evaluated the proposed EV-ELM with two types of data: UCI machine learning data and face recognition data. The UCI data was used to test the methods in general-purpose classification problems; the face data was able to examine how good the methods were at handling data with the problem of small sample size (i.e., each pattern class had only a few samples). Details of these databases are presented in Table 1. A total of 12 real world datasets were downloaded from the UCI database [34]. Five benchmark face databases were used, namely, ORL [35], UMIST [36], Yale [37], FERET [38], and Georgia Tech face database (GTFD) [39]. ORL, UMIST, and Yale formed a combo database for testing. The combo set consisted of training samples and 575 testing samples in total, and all images belonged to 75 different classes with large variations of illumination, poses, and facial expressions. The FERET database used in this paper was a preprocessed subset [40] composed of 2713 images from 320 subjects. In GTFD, each of 50 subjects had 15 images. Before the experimental evaluation, we cropped and resized images in Yale and GTFD databases to 112 × 92 to make their dimensions identical to those of samples in ORL and UMIST. Furthermore, we applied the discrete cosine transform (DCT) [41] to convert 2D face images to low-dimensional vectors of DCT coefficients.

4.2. Results and Discussion

Table 2 presents the comparison results where both training time and testing accuracy are averaged across 50 repeats of the evaluation process. It is shown that ELM is the fastest learner but performs poorly in classification. V-ELM and EV-ELM achieve much better performance. Since both V-ELM and EV-ELM create an ensemble of individual ELM classifiers, they run slower compared with ELM unless a parallel computing structure is implemented. In all databases, EV-ELM outperforms V-ELM in terms of both accuracy and variance. However, this improvement is not as much as that between voting-based methods and the original ELM algorithm. The results also show that the evolutionary weighing method is able to increase classification accuracy while bringing down the variance. In general, EV-ELM needs more time than V-ELM to train a model. When in an application that online training is not required, EV-ELM is a good alternative to V-ELM.

Figure 1 illustrates five examples on the changes in classification performance during the evolutionary process. Within 150 generations, the GA process is usually able to converge. Reflected in the figure, the classification accuracy tends to become stable toward the end of the evolutionary process. The selected weights after 150 generations cannot guarantee achieving the best classification accuracy, but they are able to provide a consistent output. This characteristic is important as testing accuracy is not available during classifier training, and it cannot be used to guide the evolutionary process. Figure 2 depicts the changes of 3 weights as examples during the GA evolution. Initially, these weights are randomly generated. With the increasing of generations, they tend to converge to either upper- or lower-bound values so that the fitness is maximized. The variety among the weights creates a dynamic ELM ensemble for decision making.

5. Conclusions

In this paper, we proposed an enhanced V-ELM method. Weights were introduced to distinguish the difference among various individual ELM classifiers and the genetic algorithm was used for optimization. Experimental results demonstrated the effectiveness of EV-ELM in terms of classification accuracy. However, slow training speed prohibits the use of EV-ELM in applications that require online training. This study is a preliminary research on optimizing ELM ensembles; many evolutionary algorithms are potentially useful in optimizing weights. Furthermore, reducing training time is of great interest in future work.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to thank the editor and reviewers for the constructive comments and suggestions.