Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2018, Article ID 5078268, 8 pages
https://doi.org/10.1155/2018/5078268
Research Article

A Novel Feature Selection Method Based on Extreme Learning Machine and Fractional-Order Darwinian PSO

1Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China
2Zhejiang Provincial Key Laboratory of Cardio-Cerebral Vascular Detection Technology and Medicinal Effectiveness Appraisal, Hangzhou, China

Correspondence should be addressed to Shun-Ren Xia; moc.361@aix_nernuhs

Received 26 January 2018; Revised 12 March 2018; Accepted 27 March 2018; Published 6 May 2018

Academic Editor: Pedro Antonio Gutierrez

Copyright © 2018 Yuan-Yuan Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The paper presents a novel approach for feature selection based on extreme learning machine (ELM) and Fractional-order Darwinian particle swarm optimization (FODPSO) for regression problems. The proposed method constructs a fitness function by calculating mean square error (MSE) acquired from ELM. And the optimal solution of the fitness function is searched by an improved particle swarm optimization, FODPSO. In order to evaluate the performance of the proposed method, comparative experiments with other relative methods are conducted in seven public datasets. The proposed method obtains six lowest MSE values among all the comparative methods. Experimental results demonstrate that the proposed method has the superiority of getting lower MSE with the same scale of feature subset or requiring smaller scale of feature subset for similar MSE.

1. Introduction

In the field of artificial intelligence, more and more variables or features are involved. An excessive set of features may lead to lower computation accuracy, slower speed, and additional memory occupation. Feature selection is used to choose smaller but sufficient feature subsets, to improve or at least not significantly harm the predicting accuracy in the meantime. Many studies have been conducted to optimize feature selections [14]. As far as we know, there are two key points in search-based feature selection process: learning algorithms and optimization algorithms. Many techniques could be involved in this process.

Various learning algorithms could be included in this process. Classical neural networks such as -nearest neighbors algorithm [5] and generalized regression neural network [6] were adopted for their simplicity and generality. More sophisticated algorithms are needed for better predicting complicated data. Support vector machine (SVM) is one of the most popular nonlinear learning algorithms and has been widely used in feature selection [711]. Extreme learning machine (ELM) is one of the most popular single hidden layer feedforward networks (SLFN) [12]. It possesses faster calculation speed and better generalization ability than traditional artificial learning methods [13, 14], which highlights the advantages of employing ELM in feature selection, as reported in some studies [1517].

In order to better locate optimal feature subsets, an efficient global search technique is needed. Particle swarm optimization (PSO) [18, 19] is an extremely simple yet fundamentally effective optimization algorithm and has produced encouraging results in feature selection [7, 20, 21]. Xue et al. considered feature selection as a multiobjective optimization problem [5] and firstly applied multiobjective PSO [22, 23] in feature selection. Some improved PSO such as hybridization of GA and PSO [9], micro-GA embedded PSO [24], and fractional-order Darwinian particle swarm optimization (FODPSO) [10] were introduced and achieved good performance in feature selection.

Training speed and optimization ability are two essential elements relating to feature selection. In this paper, we propose a novel feature selection method which employs ELM as learning algorithm and FODPSO as optimization algorithm. The proposed method is compared with SVM-based feature selection method in terms of training speed of learning algorithm and compared with traditional PSO-based feature selection method in terms of searching ability of optimization algorithm. And also, the proposed method is compared with a few well-known feature selection methods. All the comparisons are conducted on seven public regression datasets.

The remainder of the paper is organized as follows: Section 2 presents technical details about the proposed method. Section 3 conducts the comparative experiments on seven datasets. Section 4 makes conclusions of our work.

2. Proposed Method

2.1. Learning Algorithm: Extreme Learning Machine (ELM)

The schematic of ELM structure is depicted as Figure 1, where denotes the weight connecting the input layer and hidden layer and denotes the weight connecting the hidden layer and output layer. is the threshold of the hidden layer, and is the nonlinear piecewise continuous activation function which could be sigmoid, RBF, Fourier, and so forth. represents the hidden layer output matrix, is the input layer, and is the expected output. Let be the real output; ELM network is used to choose appropriate parameters to make and as close to each other as possible.

Figure 1: Schematic of extreme learning machine.

is called the hidden layer output matrix, computed by and as (2), in which denotes the number of hidden layer nodes and denotes the dimension of input :

As rigorously proven in [13], for any randomly chosen and , can always be full-rank if activation function is infinitely differentiable in any intervals. As a general rule, one needs to find the appropriate solutions of , , to train a regular network. However, given infinitely differentiable activation function, the continuous output can be approximately obtained through any randomly hidden layer neuron, if certain tuning hidden layer neuron could successfully estimate the output, as proven by universal approximation theory [24, 25]. Thus, in ELM, the only parameter that needs to be settled is . can be generated randomly.

By minimizing the absolute numerical value in (1), ELM calculated the analytical solution as follows: where is the Moore-Penrose pseudoinverse of matrix . ELM network tends to reach not only the smallest training error, but also the smallest norm of weights, which indicates that ELM possesses good generalization ability.

2.2. Optimization Algorithm: Fractional-Order Darwinian Particle Swarm Optimization (FODPSO)

Kiranyaz et al. [19] developed a population-inspired metaheuristic algorithm named particle swarm optimization (PSO). PSO is an effective evolutionary algorithm which searches for the optimum using a population of individuals, where the population is called “swarm” and individuals are called “particles.” During the evolutionary process, each particle updates its moving direction according to the best position of itself (pbest) and the best position of the whole population (gbest), formulated as follows:where is the particle position at generation in the -dimension searching space. is the moving velocity. denotes the cognition part called pbest, and represents the social part called gbest [18]. , , denote the inertia weight, learning factors, and random numbers, respectively. The searching process terminates when the number of generation reaches the predefined value.

Darwinian particle swarm optimization (DPSO) simulates natural selection in a collection of many swarms [25]. Each swarm individually performs like an ordinary PSO. All the swarms run simultaneously in case of one trap in a local optimum. DPSO algorithm spawns particle or extends swarm life when the swarm gets better optimum; otherwise, it deletes particle or reduces swarm life. DPSO has been proven to be superior to original PSO in preventing premature convergence to local optimum [25].

Fractional-order particle swarm optimization (FOPSO) introduces fractional calculus to model particles’ trajectory, which demonstrates a potential for controlling the convergence of algorithm [26]. Velocity function in (4) is rearranged with , namely,

The left side of (6) can be seen as the discrete version of the derivative of velocity with order . The discrete time implementation of the Grünwald–Letnikov derivative is introduced and expressed as where is the sample period and is the truncate order. Bring (7) into (6) with , yielding the following:

Employ (8) to update each particle’s velocity in DPSO, generating a new algorithm named fractional-order Darwinian particle swarm optimization (FODPSO) [27, 28]. Different values of control the convergence speed of optimization process. The literature [27] illustrates that FODPSO outperforms FOPSO and DPSO in searching global optimum.

2.3. Procedure of ELM_FODPSO

Each feature is assigned with a parameter within the interval . The feature is selected when its corresponding is greater than 0; otherwise the feature is abandoned. Assuming the features are in -dimensional space, variables are involved in the FODPSO optimization process. The procedure of ELM_FODPSO is depicted in Figure 2.

Figure 2: Procedure of the proposed methodology.

3. Results and Discussions

3.1. Comparative Methods

Four methods, ELM_PSO [15], ELM_FS [29], SVM_FODPSO [10], and RReliefF [30], are used for comparison. All of the codes used in this study are implemented in MATLAB 8.1.0 (The MathWorks, Natick, MA, USA) on a desktop computer with a Pentium eight-core CPU (4 GHz) and 32 GB memory.

3.2. Datasets and Parameter Settings

Seven public datasets for regression problems are adopted, including four mentioned in [29] and additional three in [31], where ELM_FS is used as a comparative method. Information about seven datasets and the methods involved in comparisons are shown in Table 1. Only the datasets adopted in [29] can be tested by their feature selection paths; thus D5, D6, and D7 in Table 1 are tested by four methods except ELM_FS.

Table 1: Information about datasets and comparative methods. A1, A2, A3, A4, and A5 represent ELM_PSO, ELM_FS, SVM_FODPSO, RReliefF, and ELM_FODPSO, respectively.

Each dataset is split into training set and testing set. 70% of the total instances are used as training sets if not particularly specified, and the rest are testing sets. During the training process, each particle has a series of feature coefficients . Hidden layer neurons number is set as 150, and kernel type as sigmoid. 10-fold cross-validation is performed to gain relatively stable MSE.

For FODPSO searching process, parameters are set as follows: is formulated by (9), where denotes the maximal iterations and equals 200. Larger increases the convergence speed in the early stage of iterations. Numbers of swarms and populations are set to 5 and 10, respectively. , in (8) are both initialized by 2. We run FODPSO for 30 independent times to gain relatively stable results. Parameters for ELM_PSO, ELM_FS, SVM_FODPSO, and RReliefF are set based on former literatures.

Convergence rate is analyzed to ensure the algorithm convergence within 200 generations. The median of the fitness evolution of the best global particle is taken for convergence analysis, depicted in Figure 3. To observe convergence of seven datasets in one figure more clearly, the normalized fitness value is adopted in Figure 3, calculated as follows:

Figure 3: Convergence analysis of seven datasets.
3.3. Comparative Experiments

In the testing set, MSE acquired by ELM is utilized to evaluate performances of four methods. For all the methods, the minimal MSE is recorded if more than one feature subset exists in the same feature scale. MSEs of D1–D7 are depicted in Figures 410, respectively. The -axis represents increasing number of selected features, while the -axis represents the minimum MSE value calculated with features selected by different methods at each scale. Feature selection aims at selecting smaller feature subsets to obtain similar or lower MSE. Thus, in Figures 410, the closer one curve gets to the left corner of coordinate, the better one method performs.

Figure 4: The evaluation results of Dataset 1.
Figure 5: The evaluation results of Dataset 2.
Figure 6: The evaluation results of Dataset 3.
Figure 7: The evaluation results of Dataset 4.
Figure 8: The evaluation results of Dataset 5.
Figure 9: The evaluation results of Dataset 6.
Figure 10: The evaluation results of Dataset 7.

ELM_FODPSO and SVM_FODPSO adopt the same optimization algorithm, yet employ ELM and SVM as learning algorithm, respectively. For each dataset, training time of ELM and SVM is obtained by randomly running them 30 times in two methods; the averaged training time of ELM and SVM in seven datasets is recorded in Table 2. It is observed that ELM acquires faster training speed in six of seven datasets. Compared with SVM, single hidden layer and analytical approach make ELM more efficient. Faster speed of ELM highlights its use in feature selection due to many iterative actions involved in FODPSO.

Table 2: Running time of SVM and ELM on seven datasets.

ELM_FODPSO, ELM_PSO, and ELM_FS adopt the same learning algorithm, yet employ FODPSO, PSO and Gradient Descent Search as optimization algorithms, respectively. For D1, D2, and D3, ELM_FODPSO and ELM_PSO perform better than ELM_FS; the former two acquire lower MSE than ELM_FS under similar feature scales. For D4, three methods get comparable performance.

Table 3 shows the minimum MSE values acquired by five methods and the corresponding numbers of selected features, separated by a vertical bar. The last column represents the MSE values calculated by all features and the total number of features. The lowest MSE values on each dataset are labeled as bold. Among all datasets, ELM_FODPSO obtains six lowest MSE values, ELM_PSO obtains two, and RReliefF obtains one. For D3, ELM_FODPSO and ELM_PSO get comparable MSE values by the same feature subset; therefore, 0.0099 and 0.0098 are both labeled as lowest MSE values. For D5, ELM_PSO and RReliefF get the lowest MSE 0.0838 using all the 8 features and ELM_FODPSO gets a similar MSE 0.0841 with only 6 features.

Table 3: Minimum MSE values and the corresponding number of selected features.

4. Conclusions

Feature selection techniques have been widely studied and commonly used in machine learning. The proposed method contains two steps: constructing fitness functions by ELM and seeking the optimal solutions of fitness functions by FODPSO. ELM is a simple yet effective single hidden layer neural network which is suitable for feature selection due to its gratifying computational efficiency. FODPSO is an intelligent optimization algorithm which owns good global search ability.

The proposed method is evaluated on seven regression datasets, and it achieves better performance than other comparative methods on six datasets. We may concentrate on exploring ELM_FODPSO in various situations of regression and classification applications in the future.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by National Key Research and Development Program of China (no. 2016YFC1306600).

References

  1. T. Lindeberg, “Feature detection with automatic scale selection,” International Journal of Computer Vision, vol. 30, no. 2, pp. 79–116, 1998. View at Publisher · View at Google Scholar · View at Scopus
  2. M. Dash and H. Liu, “Feature selection for classification,” Intelligent Data Analysis, vol. 1, no. 1–4, pp. 131–156, 1997. View at Publisher · View at Google Scholar · View at Scopus
  3. I. Iguyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003. View at Google Scholar · View at Scopus
  4. A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection methods with applications,” in Proceedings of the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015, pp. 1200–1205, Croatia, May 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimization for feature selection in classification: a multi-objective approach,” IEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 1656–1671, 2013. View at Publisher · View at Google Scholar · View at Scopus
  6. I. A. Gheyas and L. S. Smith, “Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5–13, 2010. View at Publisher · View at Google Scholar · View at Scopus
  7. X.-W. Chen, X. Zeng, and D. van Alphen, “Multi-class feature selection for texture classification,” Pattern Recognition Letters, vol. 27, no. 14, pp. 1685–1691, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. S.-W. Lin, K.-C. Ying, S.-C. Chen, and Z.-J. Lee, “Particle swarm optimization for parameter determination and feature selection of support vector machines,” Expert Systems with Applications, vol. 35, no. 4, pp. 1817–1824, 2008. View at Publisher · View at Google Scholar · View at Scopus
  9. P. Ghamisi and J. A. Benediktsson, “Feature selection based on hybridization of genetic algorithm and particle swarm optimization,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 2, pp. 309–313, 2015. View at Publisher · View at Google Scholar
  10. P. Ghamisi, M. S. Couceiro, and J. A. Benediktsson, “A novel feature selection approach based on FODPSO and SVM,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2935–2947, 2015. View at Publisher · View at Google Scholar · View at Scopus
  11. Q. Li, H. Chen, H. Huang et al., “An enhanced grey wolf optimization based feature selection wrapped kernel extreme learning machine for medical diagnosis,” Computational and Mathematical Methods in Medicine, vol. 2017, Article ID 9512741, 15 pages, 2017. View at Publisher · View at Google Scholar · View at MathSciNet
  12. G.-B. Huang and H. A. Babri, “Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,” IEEE Transactions on Neural Networks and Learning Systems, vol. 9, no. 1, pp. 224–229, 1998. View at Publisher · View at Google Scholar · View at Scopus
  13. G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. G.-B. Huang, “What are extreme learning machines? Filling the gap between Frank Rosenblatt's dream and John Von Neumann's puzzle,” Cognitive Computation, vol. 7, no. 3, pp. 263–278, 2015. View at Publisher · View at Google Scholar
  15. S. Saraswathi, S. Sundaram, N. Sundararajan, M. Zimmermann, and M. Nilsen-Hamilton, “ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 452–463, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. D. Chyzhyk, A. Savio, and M. Graña, “Evolutionary ELM wrapper feature selection for Alzheimer's disease CAD on anatomical brain MRI,” Neurocomputing, vol. 128, pp. 73–80, 2014. View at Publisher · View at Google Scholar · View at Scopus
  17. R. Ahila, V. Sadasivam, and K. Manimala, “An integrated PSO for parameter determination and feature selection of ELM and its application in classification of power system disturbances,” Applied Soft Computing, vol. 32, pp. 23–37, 2015. View at Publisher · View at Google Scholar
  18. Y. H. Shi and R. C. Eberhart, “A modified particle swarm optimizer,” in Proceedings of the IEEE International Conference on Evolutionary Computation (ICEC '98), pp. 69–73, Anchorage, Alaska, USA, May 1998. View at Scopus
  19. S. Kiranyaz, T. Ince, and M. Gabbouj, “Multi-dimensional Particle Swarm Optimization,” in Multidimensional Particle Swarm Optimization for Machine Learning and Pattern Recognition, vol. 15 of Adaptation, Learning, and Optimization, pp. 83–99, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. View at Publisher · View at Google Scholar
  20. L. Shang, Z. Zhou, and X. Liu, “Particle swarm optimization-based feature selection in sentiment classification,” Soft Computing, vol. 20, no. 10, pp. 3821–3834, 2016. View at Publisher · View at Google Scholar · View at Scopus
  21. H. B. Nguyen, B. Xue, I. Liu, P. Andreae, and M. Zhang, “New mechanism for archive maintenance in PSO-based multi-objective feature selection,” Soft Computing, vol. 20, no. 10, pp. 3927–3946, 2016. View at Publisher · View at Google Scholar · View at Scopus
  22. C. A. Coello Coello and M. S. Lechuga, “MOPSO: a proposal for multiple objective particle swarm optimization,” in Proceedings of the Congress on Evolutionary Computation (CEC '02), pp. 1051–1056, May 2002. View at Publisher · View at Google Scholar · View at Scopus
  23. J. J. Durillo, J. García-Nieto, A. J. Nebro, C. A. Coello, F. Luna, and E. Alba, “Multi-objective particle swarm optimizers: an experimental comparison,” in Evolutionary Multi-Criterion Optimization, vol. 5467 of Lecture Notes in Computer Science, pp. 495–509, Springer, Berlin, Germany, 2009. View at Publisher · View at Google Scholar
  24. K. Mistry, L. Zhang, S. C. Neoh, C. P. Lim, and B. Fielding, “A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition,” IEEE Transactions on Cybernetics, vol. 47, no. 6, pp. 1496–1509, 2017. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Tillett, R. Rao, and F. Sahin, “Cluster-head identification in ad hoc sensor networks using particle swarm optimization,” in Proceedings of the ICPWC 2002 - IEEE International Conference on Personal Wireless Communications, pp. 201–205, New Delhi, India. View at Publisher · View at Google Scholar
  26. E. J. S. Pires, J. A. T. MacHado, P. B. de Moura Oliveira, J. B. Cunha, and L. Mendes, “Particle swarm optimization with fractional-order velocity,” Nonlinear Dynamics, vol. 61, no. 1-2, pp. 295–301, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. M. S. Couceiro, R. P. Rocha, N. M. F. Ferreira, and J. A. T. Machado, “Introducing the fractional-order Darwinian PSO,” Signal, Image and Video Processing, vol. 6, no. 3, pp. 343–350, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. M. S. Couceiro, F. M. L. Martins, R. P. Rocha, and N. M. F. Ferreira, “Mechanism and Convergence Analysis of a Multi-Robot Swarm Approach Based on Natural Selection,” Journal of Intelligent & Robotic Systems, vol. 76, no. 2, pp. 353–381, 2014. View at Publisher · View at Google Scholar · View at Scopus
  29. F. Benoît, M. van Heeswijk, Y. Miche, M. Verleysen, and A. Lendasse, “Feature selection for nonlinear models with extreme learning machines,” Neurocomputing, vol. 102, pp. 111–124, 2013. View at Publisher · View at Google Scholar · View at Scopus
  30. M. Robnik-Šikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,” Machine Learning, vol. 53, no. 1-2, pp. 23–69, 2003. View at Publisher · View at Google Scholar · View at Scopus
  31. L. Bravi, V. Piccialli, and M. Sciandrone, “An optimization-based method for feature ranking in nonlinear regression problems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 4, pp. 1005–1010, 2016. View at Publisher · View at Google Scholar · View at Scopus