Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering

Volume 2014, Article ID 808292, 7 pages

http://dx.doi.org/10.1155/2014/808292
Research Article

Evolutionary Voting-Based Extreme Learning Machines

1Department of Emergency Medicine, Singapore General Hospital, Singapore 169608

2Institute of Information and Control, Hangzhou Dianzi University, Zhejiang 310018, China

3School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798

4Health Services and Systems Research, Duke-NUS Graduate Medical School, Singapore 169857

Received 13 June 2014; Accepted 29 July 2014; Published 14 August 2014

Academic Editor: Tao Chen

Copyright © 2014 Nan Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Voting-based extreme learning machine (V-ELM) was proposed to improve learning efficiency where majority voting was employed. V-ELM assumes that all individual classifiers contribute equally to the decision ensemble. However, in many real-world scenarios, this assumption does not work well. In this paper, we aim to enhance V-ELM by introducing weights to distinguish the importance of each individual ELM classifier in decision making. Genetic algorithm is used for optimizing these weights. This evolutionary V-ELM is named as EV-ELM. Results on several benchmark databases show that EV-ELM achieves the highest classification accuracy compared with V-ELM and ELM.

1. Introduction

The extreme learning machine (ELM) [1, 2] was proposed as a new type of learning algorithm for single-hidden layer feedforward neural network (SLFN). ELM adopts a training mechanism that does not require parameter tuning so that it learns faster than traditional gradient-based neural networks while achieving even better classification/regression performance [3]. ELM has received increasing attentions in recent years; many efforts have been dedicated to improve its performance [47] and apply it in various applications [814]. Among these ELM extensions, ensemble learning-based methods [1523] have found their advantages such as better accuracy and less variance compared with the original ELM algorithm.

Ensemble learning has been popular for decades. Jain et al. [24] wrote a concise yet informative introduction to the classifier combination. Polikar [25] comprehensively reviewed the area of multiple classifier system (ensemble system) for decision making. Majority voting [26] is one of the most commonly used combining strategies. This rule seeks the class that receives the highest number of votes and assigns it to the predicted label for the testing pattern. Since each classifier in an ensemble does not necessarily contribute equally to the final decision, Littlestone and Warmuth [27] proposed weighing individual classifiers to make them discriminative. Ensemble learning-based ELM algorithms [16, 18, 19] were reported to successfully resolve the problems of predictive instability and overfitting.

Among ensemble-based extensions, voting-based ELM (V-ELM) [18] was proposed to perform multiple independent ELM training using a simple and effective learning architecture. A decision was made based on majority voting. V-ELM not only enhanced the classification performance, but also lowered the variance. In this paper, we aim to investigate the introduction of weights for all individual ELM classifiers to enhance the V-ELM algorithm. The hypothesis is that each individual ELM classifier presents various levels of confidence in decision making. In our proposed method, each weight represents the importance of an ELM classifier. Final decision is made with the weighted majority voting scheme.

The remainder of this paper is organized as follows. Section 2 briefly reviews ELM and V-ELM algorithms. Section 3 presents the proposed evolutionary V-ELM (EV-ELM) algorithm. Section 4 demonstrates the performance of EV-ELM and compares it with ELM and V-ELM. Section 5 concludes this study.

2. Background

2.1. Extreme Learning Machine

In the process of SLFN learning, ELM randomly selects weights and biases for hidden nodes. Then it analytically determines the output weights by finding the least squares solution. Given a training set consisting of samples , where is an input vector and is an target vector, an SLFN with hidden nodes is formulated as where the additive hidden node is employed. Weight vector connects the th hidden node and input neurons. In approximating samples using hidden nodes, , , and are supposed to exist if zero error is obtained. Consequently, (1) can be written as where is hidden layer output matrix of the network; is the output of th hidden neuron with respect to , and ; and are output weight matrix and target matrix, respectively.

The ELM algorithm can be summarized as three steps: (1) generate parameters and randomly for ; (2) calculate the hidden layer output matrix ; and (3) calculate the output weight using . It has been shown in [28] that any continuous target functions in can be universally approximated using single SLFN with randomly chosen additive hidden nodes.

2.2. Voting-Based Extreme Learning Machine

In ELM, randomized hidden nodes are used and remain unchanged during the training. Some testing samples could be misclassified in certain situations, for example, when they are near the classification boundary. To tackle this issue, V-ELM incorporates multiple individual ELMs and makes decisions with majority voting. V-ELM uses a fixed number of hidden nodes for all individual ELMs. All these ELMs are trained with the same dataset and the learning parameters of each ELM are randomly initialized. The predicted class label is then determined by majority voting on all results obtained from ELMs.

3. Evolutionary Voting-Based ELM

Given a learning set consisting of samples , , where is the class label. We assume that is the input and is predicted by . In V-ELM, the aim is to better predict using multiple ELMs than a single one. Suppose that predicts a class label and the prediction of th classifier is where , the ensemble decision can be defined as The voting is plurality version which means that the output is the value with highest number of votes whether or not the sum of votes exceeds half.

In many applications, not all the classifiers contribute equally to decision making. The overall performance of the ensemble system is able to be improved by weighing the decisions prior to combination [27]. In this section, an evolutionary voting-based ELM (EV-ELM) using weighted majority voting is proposed. The general algorithm is elaborated in Algorithm 1.

alg1
Algorithm 1: Evolutionary voting based ELM.

We denote as the weights for th individual ELM. The mathematical representation of weighted voting algorithm is shown as In the framework of weighted majority voting algorithm, the weight needs to be optimized to improve the generalization performance. If we know certain classifiers working better, we are able to assign larger weights to the corresponding ones. However, such knowledge is usually absent.

Conventional parameter updating methods find the optimal weights to provide better generalization performance. But the optimization process has the risk of getting the local minima and maxima. Methods that discover global optimum can be implemented to further improve classification accuracy. Genetic algorithm (GA) [29] is a class of optimization procedures inspired by the biological mechanisms of reproduction. Many applications utilize the advantages of GA to find optimal solutions, for example, face recognition [30] and clustering techniques [31]. GA is implemented in this paper for demonstration purpose. In practice, many new emerging techniques are potential alternatives such as differential evolution [32] and particle swarm optimization [33].

In order to use GA to select proper weights, the chromosomes are formed by . At the beginning, a population of chromosomes is generated randomly. Then, the fitness function is calculated for each chromosome as where and are the weight and training accuracy for the th ELM. Maximizing the fitness implies that by choosing appropriate weights we are able to achieve the best normalized training accuracy across all ELM classifiers in the ensemble. Such a set of optimal weights provide the decision ensemble good generalization performance on unseen testing samples. The fitness function is the most important measurement to determine the composition of the next generation and to guide the entire evolutionary process.

After chromosome selection, parts of the current population are inherited into the next generation. The remaining strings are reproduced and some of the parent chromosomes have undergone crossover operation. The crossover probability is defined as . To extend the search space to find global optimum, mutation is applied to some of offsprings randomly with a very small mutation probability . The mutation will introduce a degree of diversity to the population and prevent a premature convergence. The evolutionary process will be terminated after generations, which are considered enough for convergence.

4. Performance Evaluation

Experiments were carried out in MATLAB 7 environment under a desktop equipped with Intel 3.2 GHz CPU and 4 G RAM. The learning and testing processes were repeated 50 times and the mean and standard deviations were reported in results. In the experiments, the range of weight was where the exponential function was used to enhance the difference between lower-bound and upper-bound values of the weight. Moreover, the crossover probability and mutation probability were chosen as 0.65 and 0.004, respectively. The population size was 100 and the number of generations in GA was 150 for all experiments.

4.1. Databases

We evaluated the proposed EV-ELM with two types of data: UCI machine learning data and face recognition data. The UCI data was used to test the methods in general-purpose classification problems; the face data was able to examine how good the methods were at handling data with the problem of small sample size (i.e., each pattern class had only a few samples). Details of these databases are presented in Table 1. A total of 12 real world datasets were downloaded from the UCI database [34]. Five benchmark face databases were used, namely, ORL [35], UMIST [36], Yale [37], FERET [38], and Georgia Tech face database (GTFD) [39]. ORL, UMIST, and Yale formed a combo database for testing. The combo set consisted of training samples and 575 testing samples in total, and all images belonged to 75 different classes with large variations of illumination, poses, and facial expressions. The FERET database used in this paper was a preprocessed subset [40] composed of 2713 images from 320 subjects. In GTFD, each of 50 subjects had 15 images. Before the experimental evaluation, we cropped and resized images in Yale and GTFD databases to 112 × 92 to make their dimensions identical to those of samples in ORL and UMIST. Furthermore, we applied the discrete cosine transform (DCT) [41] to convert 2D face images to low-dimensional vectors of DCT coefficients.

tab1
Table 1: Databases used in the experiments.
4.2. Results and Discussion

Table 2 presents the comparison results where both training time and testing accuracy are averaged across 50 repeats of the evaluation process. It is shown that ELM is the fastest learner but performs poorly in classification. V-ELM and EV-ELM achieve much better performance. Since both V-ELM and EV-ELM create an ensemble of individual ELM classifiers, they run slower compared with ELM unless a parallel computing structure is implemented. In all databases, EV-ELM outperforms V-ELM in terms of both accuracy and variance. However, this improvement is not as much as that between voting-based methods and the original ELM algorithm. The results also show that the evolutionary weighing method is able to increase classification accuracy while bringing down the variance. In general, EV-ELM needs more time than V-ELM to train a model. When in an application that online training is not required, EV-ELM is a good alternative to V-ELM.

tab2
Table 2: Comparison results among ELM, V-ELM, and EV-ELM algorithms using benchmark UCI and face databases.

Figure 1 illustrates five examples on the changes in classification performance during the evolutionary process. Within 150 generations, the GA process is usually able to converge. Reflected in the figure, the classification accuracy tends to become stable toward the end of the evolutionary process. The selected weights after 150 generations cannot guarantee achieving the best classification accuracy, but they are able to provide a consistent output. This characteristic is important as testing accuracy is not available during classifier training, and it cannot be used to guide the evolutionary process. Figure 2 depicts the changes of 3 weights as examples during the GA evolution. Initially, these weights are randomly generated. With the increasing of generations, they tend to converge to either upper- or lower-bound values so that the fitness is maximized. The variety among the weights creates a dynamic ELM ensemble for decision making.

fig1
Figure 1: Five examples show the changes in classification performance during the evolutionary process. The -axis is the number of generations and the -axis is classification accuracy during each generation. The results are based on UCI Heart dataset.
808292.fig.002
Figure 2: There example weights and their value changes in value changes during the evolutionary process. The results are based on UCI Heart dataset.

5. Conclusions

In this paper, we proposed an enhanced V-ELM method. Weights were introduced to distinguish the difference among various individual ELM classifiers and the genetic algorithm was used for optimization. Experimental results demonstrated the effectiveness of EV-ELM in terms of classification accuracy. However, slow training speed prohibits the use of EV-ELM in applications that require online training. This study is a preliminary research on optimizing ELM ensembles; many evolutionary algorithms are potentially useful in optimizing weights. Furthermore, reducing training time is of great interest in future work.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to thank the editor and reviewers for the constructive comments and suggestions.

References

  1. G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. G. Huang, D. H. Wang, and Y. Lan, “Extreme learning machines: a survey,” International Journal of Machine Learning and Cybernetics, vol. 2, no. 2, pp. 107–122, 2011. View at Publisher · View at Google Scholar · View at Scopus
  3. G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 42, no. 2, pp. 513–529, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten, and A. Lendasse, “OP-ELM: optimally pruned extreme learning machine,” IEEE Transactions on Neural Networks, vol. 21, no. 1, pp. 158–162, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. Q.-Y. Zhu, A. K. Qin, P. N. Suganthan, and G.-B. Huang, “Evolutionary extreme learning machine,” Pattern Recognition, vol. 38, no. 10, pp. 1759–1763, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. Z. Sun, K.-F. Au, and T.-M. Choi, “A neuro-fuzzy inference system through integration of fuzzy logic and extreme learning machines,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 37, no. 5, pp. 1321–1331, 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. W. Zong, G.-B. Huang, and Y. Chen, “Weighted extreme learning machine for imbalance learning,” Neurocomputing, vol. 101, pp. 229–242, 2013. View at Publisher · View at Google Scholar · View at Scopus
  8. R. Zhang, G. Huang, N. Sundararajan, and P. Saratchandran, “Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 485–495, 2007. View at Publisher · View at Google Scholar · View at Scopus
  9. Z.-L. Sun, T.-M. Choi, K.-F. Au, and Y. Yu, “Sales forecasting using extreme learning machine with applications in fashion retailing,” Decision Support Systems, vol. 46, no. 1, pp. 411–419, 2008. View at Publisher · View at Google Scholar · View at Scopus
  10. N. Liu and H. Wang, “Evolutionary extreme learning machine and its application to image analysis,” Journal of Signal Processing Systems, vol. 73, pp. 1–9, 2013. View at Publisher · View at Google Scholar · View at Scopus
  11. S. Suresh, R. V. Babu, and H. J. Kim, “No-reference image quality assessment using modified extreme learning machine classifier,” Applied Soft Computing, vol. 9, no. 2, pp. 541–552, 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. Y. Jin, J. Cao, Q. Ruan, and X. Wang, “Cross-modality 2D-3D face recognition via multiview smooth discriminant analysis based on ELM,” Journal of Electrical and Computer Engineering, vol. 2014, Article ID 584241, 9 pages, 2014. View at Publisher · View at Google Scholar
  13. L. Mao, L. Zhang, X. Liu, C. Li, and H. Yang, “Improved extreme learning machine and its application in image quality assessment,” Mathematical Problems in Engineering, vol. 2014, Article ID 426152, 7 pages, 2014. View at Publisher · View at Google Scholar · View at MathSciNet
  14. J. Cao and L. Xiong, “Protein sequence classification with improved extreme learning machine algorithms,” BioMed Research International, vol. 2014, Article ID 103054, 12 pages, 2014. View at Publisher · View at Google Scholar
  15. Y. Lan, Y. C. Soh, and G.-B. Huang, “Ensemble of online sequential extreme learning machine,” Neurocomputing, vol. 72, no. 13–15, pp. 3391–3395, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. N. Liu and H. Wang, “Ensemble based extreme learning machine,” IEEE Signal Processing Letters, vol. 17, no. 8, pp. 754–757, 2010. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Van Heeswijk, Y. Miche, E. Oja, and A. Lendasse, “GPU-accelerated and parallelized ELM ensembles for large-scale regression,” Neurocomputing, vol. 74, no. 16, pp. 2430–2437, 2011. View at Publisher · View at Google Scholar · View at Scopus
  18. J. Cao, Z. Lin, G. Huang, and N. Liu, “Voting based extreme learning machine,” Information Sciences, vol. 185, pp. 66–77, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. J.-H. Zhai, H.-Y. Xu, and X.-Z. Wang, “Dynamic ensemble extreme learning machine based on sample entropy,” Soft Computing, vol. 16, no. 9, pp. 1493–1502, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. D. Wang and M. Alhamdoosh, “Evolutionary extreme learning machine ensembles with size control,” Neurocomputing, vol. 102, pp. 98–110, 2013. View at Publisher · View at Google Scholar · View at Scopus
  21. Q. Yu, M. van Heeswijk, Y. Miche et al., “Ensemble delta test-extreme learning machine (DT-ELM) for regression,” Neurocomputing, vol. 129, pp. 153–158, 2014. View at Google Scholar
  22. H.-J. Lu, C.-L. An, E.-H. Zheng, and Y. Lu, “Dissimilarity based ensemble of extreme learning machine for gene expression data classification,” Neurocomputing, vol. 128, pp. 22–30, 2014. View at Google Scholar
  23. X. Xue, M. Yao, Z. Wu, and J. Yang, “Genetic ensemble of extreme learning machine,” Neurocomputing, vol. 129, pp. 175–184, 2014. View at Google Scholar
  24. A. K. Jain, R. P. W. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4–37, 2000. View at Publisher · View at Google Scholar · View at Scopus
  25. R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006. View at Publisher · View at Google Scholar · View at Scopus
  26. L. I. Kuncheva, Combining Pattern Classifiers, Methods and Algorithms, Wiley-Interscience, New York, NY, USA, 2005. View at MathSciNet
  27. N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no. 2, pp. 212–261, 1994. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  28. G. Huang, L. Chen, and C. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879–892, 2006. View at Publisher · View at Google Scholar · View at Scopus
  29. D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, Mass, USA, 1989.
  30. C. Liu and H. Wechsler, “Evolutionary pursuit and its application to face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 570–582, 2000. View at Publisher · View at Google Scholar · View at Scopus
  31. U. Maulik and S. Bandyopadhyay, “Genetic algorithm-based clustering technique,” Pattern Recognition, vol. 33, no. 9, pp. 1455–1465, 2000. View at Publisher · View at Google Scholar · View at Scopus
  32. R. Storn and K. Price, “Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, 1997. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  33. J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, Perth, Australia, December 1995. View at Publisher · View at Google Scholar · View at Scopus
  34. K. Bache and M. Lichman, “UCI machine learning repository,” 2013, http://archive.ics.uci.edu/ml.
  35. F. S. Samaria, Face recognition using hidden markov models [Ph.D. thesis], University of Cambridge, Cambridge, UK, 1994.
  36. D. B. Graham and N. M. Allinson, “Characterizing virtual eigensignatures for general purpose face recognition,” in Face Recognition: From Theory to Applications, H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman-Soulie, and T. S. Huang, Eds., vol. 163 of NATO ASI Series F, Computer and Systems Sciences, pp. 446–456, 1998. View at Google Scholar
  37. P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997. View at Publisher · View at Google Scholar · View at Scopus
  38. P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERET evaluation methodology for face-recognition algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090–1104, 2000. View at Publisher · View at Google Scholar · View at Scopus
  39. L. Chen, H. Man, and A. V. Nefian, “Face recognition based on multi-class mapping of fisher scores,” Pattern Recognition, vol. 38, no. 6, pp. 799–811, 2005. View at Publisher · View at Google Scholar · View at Scopus
  40. H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 103–123, 2009. View at Publisher · View at Google Scholar · View at Scopus
  41. W. Chen, M. J. Er, and S. Wu, “Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 36, no. 2, pp. 458–466, 2006. View at Publisher · View at Google Scholar · View at Scopus