About this Journal Submit a Manuscript Table of Contents
Applied Computational Intelligence and Soft Computing
Volume 2012 (2012), Article ID 347157, 21 pages
http://dx.doi.org/10.1155/2012/347157
Research Article

Data and Feature Reduction in Fuzzy Modeling through Particle Swarm Optimization

1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada T6G 2G7
2Systems Research Institute, Polish Academy of Sciences, 01-447 Warsaw, Poland

Received 15 August 2011; Revised 1 November 2011; Accepted 8 December 2011

Academic Editor: Miin-Shen Yang

Copyright © 2012 S. Sakinah S. Ahmad and Witold Pedrycz. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The study is concerned with data and feature reduction in fuzzy modeling. As these reduction activities are advantageous to fuzzy models in terms of both the effectiveness of their construction and the interpretation of the resulting models, their realization deserves particular attention. The formation of a subset of meaningful features and a subset of essential instances is discussed in the context of fuzzy-rule-based models. In contrast to the existing studies, which are focused predominantly on feature selection (namely, a reduction of the input space), a position advocated here is that a reduction has to involve both data and features to become efficient to the design of fuzzy model. The reduction problem is combinatorial in its nature and, as such, calls for the use of advanced optimization techniques. In this study, we use a technique of particle swarm optimization (PSO) as an optimization vehicle of forming a subset of features and data (instances) to design a fuzzy model. Given the dimensionality of the problem (as the search space involves both features and instances), we discuss a cooperative version of the PSO along with a clustering mechanism of forming a partition of the overall search space. Finally, a series of numeric experiments using several machine learning data sets is presented.

1. Introduction

In fuzzy modeling, the two main approaches for generating the rules rely on knowledge acquisition from human experts and knowledge discovery from data [1, 2]. In recent years, knowledge discovery from data or data-driven fuzzy modeling has become more important [24]. In many cases, the ability to develop models efficiently is hampered by the dimensionality of the input space as well as the number of data. If we are concerned with rule-based models, the high-dimensionality of the feature space along with the topology of the rules gives rise to the curse of dimensionality [1, 4]. The number of rules increases exponentially and is equal to , where is the number of features and stands for the number of fuzzy sets defined for each feature.

The factors that contribute most to the accuracy of the data-driven fuzzy modeling are associated with the size of the input space and the decomposition of the input data. A Large number of data points or instances in a continuous input-output domain exhibit a significant impact on fuzzy models. It is well known that more training data will not always lead to a better performance for data-driven models. Large amount of training data have important implications on the modeling capabilities. Since the number of fuzzy sets determines the family of realizable approximation functions, larger datasets present the possibility of over-fitting the training data [1, 4]. Thus, the effectiveness of the fuzzy models relies on the quality of the training data. In addition, the main drawback is the fuzzy models’ relative inefficiency as the size of the data increases, regarding both the number of data points in the data set and the number of features. Moreover, one of the most widely used approaches in fuzzy modeling is the fuzzy C-means (FCM) algorithm for constructing the antecedents of the rules associated with the curse of dimensionality [5, 6].

The dimensionality problem can be addressed by reducing the constructed fuzzy rules. The reduction method plays two important roles: it increases the effectiveness of the learning algorithm, since the learning algorithm will concentrate only on the most useful subset of data, and it also improves the computational efficiency as the learning algorithm involves only a subset of data smaller than the original dataset [7]. This reduction can be realized by removing the redundant fuzzy rules by exploiting a concept of fuzzy similarity [3, 7, 8]. Evolutionary algorithms have also been used for building compact fuzzy rules [912]. An evolutionary algorithm is used to tune the structure and the rules’ parameter of the fuzzy systems [13, 14]. However, in numerous cases, some variables are not crucial to the realization of the fuzzy model. A suitable way to overcome this problem is to implement feature selection before constructing the fuzzy models. Therefore, during the last decade, feature selection methods in conjunction with constructing fuzzy models for reducing the curse of dimensionality were developed [1522]. This process reduces the fuzzy rule search space and increases the accuracy of the model.

As mentioned above, forming the best input data as the training set to construct the fuzzy modeling is also important. However, as far as we know there is no research that has been done to simultaneously select the best subset of features and input data for constructing the fuzzy model. Most of the research is focused on reducing the fuzzy rules, and the process of simplifying the system is done once the design has been completed. Here we propose a method that reduces the complexity of the system starting from the design stage. however, the process of constructing the antecedent and the consequent parts of the fuzzy model is realized using the best subset of input data.

In this paper, a comprehensive framework is proposed to construct fuzzy models from the subset of numerical input-output data. First, we develop a data-driven fuzzy modeling framework for a high-dimensional large dataset, which is capable of generating a rule-based automatically from numerical data. Second, we integrate the concept of feature selection and data selection together in the unified form to further refine (reduce) the fuzzy models. In this regard, the PSO technique is applied in order to search for the best subset of data. In order to increase the effectiveness of the PSO techniques, we introduce a new cooperative PSO method based on the information granulation approach. Third, we develop a flexible setup to cope with the optimization of variables and data to be used in the design of the fuzzy model. The proposed approach allows the user to choose the predetermined fraction of variables and data that can be used to construct the fuzzy models.

This paper is organized as follows. We briefly elaborate on the selected approaches to data and feature space reduction in Section 2, and then in Section 3, we recall the main algorithmic features of PSO and its cooperative version, CPSO, which is of interest in problems of high-dimensionality. The proposed fuzzy modeling framework along with its main algorithmic developments is presented in Section 4. Experimental studies are presented in Section 5, and conclusions are provided in Section 6.

2. Selected Approaches to Data and Space Reduction

In general, reduction processes involve feature selection (FS), instances (data) selection (IS), and a combination of these two reduction processes: feature and Instances selection (FIS). Feature selection is a subject of the main reduction pursuits. The goal of FS, which is commonly encountered in problems of system modeling and pattern recognition, is to select the best subset of features so that the model formed in this new feature (input) space exhibits the highest accuracy (classification rate) being simultaneously associated with the increased transparency of the resulting construct [23]. The process aims to discard irrelevant and/or redundant features [24]. In general, the FS algorithms can be classified into three main categories: filters, wrappers, and embedded methods. The filter method selection criterion is independent of the learning algorithm. In contrast to the wrapper method, the selection criterion is dependent on the learning algorithm and uses its performance index as the evaluation criterion. The embedded method incorporates feature selection as part of the training process. The reader can refer to [2325] for more details.

Instances selection (IS), another category of reduction approaches, is concerned with the selection of the relevant data (instances) reflective of the knowledge pertinent to the problem at hand [26, 27]. The three main functions forming the essence of IS include enabling, focusing and cleaning [26].

In this study, as stated earlier, instead of approaching feature selection and instances selection separately, we focus on the integration of feature selection and instances selection in the construction of the fuzzy models. Both processes are applied simultaneously to the initial dataset, in order to obtain a suitable subset of feature and data to construct the parameters for the fuzzy model. In the literature, some methods for integrating feature and instances selection are more focused on a class of classification problems [28, 29].

The ideas of feature and data reduction as well as hybrid approaches have been discussed in the realm of fuzzy modeling. Table 1 offers a snapshot at the diversity of the existing approaches and the advantages gained by completing the reduction processes.

tab1
Table 1: A summary of selected studies in data and feature reduction in fuzzy modeling.

3. Particle Swarm Optimization and Its Cooperative Version

Population-based algorithms provide interesting solutions since any constructive method can be used to generate the initial population, and any local search technique can be used to improve each solution in the population [30]. In addition, population-based methods have the advantage of being able to combine good solutions in order to obtain potentially better ones. Most of the population-based algorithm approaches in FS and IS are based on GAs. Some recent studies [28, 29, 31] have employed population-based optimization techniques to carry out search for the best subset of variables and data for solving the application problems, but all of them were carried out to solve the classification problem. Therefore, in this study, we use population-based technique for selecting the best subset of feature and data for the regression problem. Here, we implement particle swarm optimization (PSO) techniques to intelligently search for the best subset of features and data (instances).

PSO, developed by Kennedy and Eberhart, inspired by the collective behavior of birds or fish [32], is a population-based algorithm where each individual, referred to as a particle, represents a candidate solution. Each particle proceeds through the search space at a given velocity that is dynamically modified according to its own experience and results in its local best (lb) performance. It is also affected by others particles flying experience resulting in the best value, global best (gb). The underlying expression for the update of the velocity in successive generations reads as follows: where (the search space is equal to the sum of the dimensionalities of the feature space and the size of the data). The inertia weight () is confined to the range [0, 1]; its values can decrease over time. The cognitive factor and social factor determine the relative impact coming from the particle’s own experience and the local best and global best. and are numbers drawn from a uniform distribution over the unit interval that brings some component of randomness to the search process.

In this research, we employed the PSO-based method to handle two optimization tasks, namely, (1) selection of the optimal subset of features and (2) selection of the optimal subset of instances based on the concept of information granularity. In order to reduce the computational complexity of using the standard PSO, we employed cooperative PSO method to simultaneously solve the two optimization tasks. The motivation behind the use of cooperative PSO, as advocated in [33], is to deal effectively with the dimensionality of the search space, which becomes a serious concern when a large number of data with a large dimensionality are involved. This curse of dimensionality is a significant impediment negatively impacting the effectiveness of standard PSO. The essence of the cooperative version of PSO is essentially a parallel search for optimal subset of features and its optimal subset of instances. The cooperative strategy is achieved by dividing the candidate solution vector into components, called subswarm, where each subswarm represents a small part of the overall optimization processes. By doing this, we implement the concept of divide and conquer to solve the optimization problem, so that the process will become more efficient and fast.

The mechanism of information sharing of CPSO is shown in Figure 1. The cooperative search between one subswarm and other is achieved by sharing the information of the global best position () across all subswarm. Here the algorithm has the advantage of taking two steps forward because the candidate solution comes from the best position for all subswarm except only for the current subswarms being evaluated. Therefore, the algorithm will not spend too much time optimizing the features or instances that have little effect to the overall solution. The rate at which each swarm converges to the solution is significantly higher than the rate of convergence of the standard PSO.

347157.fig.001
Figure 1: The schematic diagram of information sharing in CPSO.

The essence of the cooperative version of PSO is to split the data into several groups so that each group is handled by a separate PSO. The main design question involves splitting the variables into groups. A sound guideline is to keep the related (associated) variables within the same group. Obviously, such relationships are not known in advance. Several possible methods are available for addressing this issue in more detail in the context of the problem at hand. (a)As we are concerned with a collection of features and data (instances), a natural way to split the variables would be to form two groups (), one for the features () and another one for the instances (). This split would be legitimate if the dimensionality of both subsets was quite similar. (b)In some situations, one of the subsets (either the data or the features) might be significantly larger than the other one. We often encounter a large number of data, but in some situations, a large number of features might be present (for instance, in microarray data analysis). This particular collection of data or features is then split into K groups. Clustering such items is a viable algorithmic approach. Running K-means or fuzzy C-means produces clusters (group) of variables that are used in the individual PSO. (c)In case both subsets are large, the clustering is realized both for the features and data, and the resulting, structure (partition) is used to run cooperative PSO.

Algorithm 1 presents the Cooperative PSO pseudocode implementing the optimization process [33]. Firstly, the PSO is divided into subspaces, called subswarms. In our case the first subswarm represents the features search space and the rest are for instances search space. refers to the position of particle of subswarms . The global best for each subswarm defined as , and the local best is defined as . The cooperation between the subswarms employed in the function , which returns -dimensional vector formed by concatenating all the global best vector across all subswarms, except for the current position . Here the th component is called and represent the position of any particle from subswarm .

alg1
Algorithm 1: Pseudocode for cooperative PSO.

4. PSO-Integrated Feature and Data Reduction in Fuzzy-Rule-Based Models

As the problem of feature data reduction is inherently combinatorial nature, PSO provides an interesting and computationally viable optimization alternative. In the following subsections, we start with a general optimization setting and then discuss the PSO realization of the search process (here, a crucial design phase is a formation of the search space with a suitable encoding mechanism). Although the proposed methodology is of general nature, we concentrate on rule-based models, which are commonly present in fuzzy modeling, to help offer a detailed view of the overall design process.

4.1. An Overall Reduction Process

As is usual in system modeling, we consider a supervised learning scenario ion which we encounter in a finite set of training data , . By stressing the nature of the data and their dimensionality, the data space along with -dimensional feature vectors can be viewed as a Cartesian product of the data and features . The essence of the reduction is to arrive at the Cartesian product of the reduced data and feature spaces, , where, and . The cardinality of the reduced spaces is equal to and , where and .

The overall scheme of the reduction process outlining a role of the PSO-guided reduction is illustrated in Figure 2. The scheme can be divided into two important parts and can be described as follows. (a)Reduction process via PSO: a reduction process tackles both feature reduction and data reduction simultaneously. PSO algorithm is used to search for the best feature and data for constructing the fuzzy model. Here, the size of the selected features () and data () is provided in advance by the user. After the PSO meets the maximum generation, the process is stopped, and the last best subset of features and data is the best subset of data for constructing the fuzzy model. (b)Evaluation process: the fuzzy C-means algorithm is used to convert the numerical data into the information granules. Here, the information granularity process deals only with the subset of the data and features (). Next, the consequent parameter constructed from the fuzzy models is use to evaluate the performance of the selected data and features. At this stage we access the performance of the constructed fuzzy model in terms of their capability to fit the model by using the all instances in the original data set.

347157.fig.002
Figure 2: The scheme of the proposed data and reduction for fuzzy modeling.

As it becomes apparent, the original space is reduced, and in this Cartesian product a fuzzy model, denoted by FM, is designed in the usual way (we elaborate on the form of the fuzzy model in the subsequent section). Its design is guided by a certain objective function expressed over all elements of original instances. The quality of the reduced space is assessed by quantifying the performance of the fuzzy model operating over the original, non-reduced space. The same performance index as used in the construction of the fuzzy model in the reduced space is used to describe the quality of the fuzzy model: Note that the summation shown above is taken over all the elements forming the data space . By taking another look at the overall reduction scheme, it is worth noting that the reduction is realized as in the wrapper mode, in which we use a fuzzy model to evaluate the quality of the reduction mechanism.

4.2. The PSO-Based Representation of the Search Space

The reduction of the data and feature spaces involves a selection of a subset of the data and a subset of the features. Therefore, the problem is combinatorial in its nature. PSO is used here to form a subset of integers which are indexes of the data or features to be used in the formation of . For instance, is represented as a set of indexes being a subset of integers . From the perspective of the PSO, the particle is formed as a string of real numbers in of the length of ; effectively, the search space is a hypercube . The first substring of length represents the data; the second one (having entries) is used to optimize the subset of features. The particle is decoded as follows. Each substring is processed (decoded) separately. The real number entries are ranked. The result is a list of integers viewed as the indexes of the data. The first entries out of the -position substring are selected to form . The same process is applied to the substring representing the set of features. An overall decoding scheme is illustrated in Figure 3.

347157.fig.003
Figure 3: From a particle in search space to a subset of instances and features.

The information given by the PSO is used to represent the subset of features and data to construct the data-driven fuzzy models. Then, the numerical data are represented in terms of a collection of information granules (a fuzzy sets) produced through some clustering (fuzzy clustering). The information about the granules (clusters) is then used to construct the fuzzy models.

In the cooperative PSO, the formation of the search space is realized in a more sophisticated way. The cooperative facet involves mainly exchanging information about the best positions found by the different subswarms. Here, we present a new cooperative PSO (CPSO) algorithm for the data and feature reduction process. The selection of the number of cooperating swarms is important because it will affect the performance of the cooperative PSO model. Sub-swarm 1 represents the features’ column and subswarm 2 represents the instances’ row of the particular data set. Figure 4 illustrates the main difference between standard PSO and cooperative PSO. The standard PSO contains one swarm with a large dimension of search space. In contrast, for the cooperative PSO, we divide the search space into two subswarms: subswarm 1 for feature representation and subswarm 2 for instances representation. All the subswarms share the same basic particles definition illustrated in Figure 4.

fig4
Figure 4: The particle scheme of the “standard” PSO (a) and cooperative PSO (b).

In general, the dimensionality for the data (instances) selection is higher than that of the feature selection. In order to reduce the impact of the curse of dimensionality, we decompose the data into several groups by using the information granulation approach. In this paper, we used the fuzzy C-means (FCMs) to construct the information granules. Therefore, the number of decomposition groups is actually the number of the clusters () used in the FCM. For example, if we want to decompose the data into three groups, we use the number of clusters equal to three. As a result, instead of having only two subswarms, we introduce more subswarms that represent different groups of data.

Figure 5 presents the process of constructing the subswarms for cooperative PSO by decomposing the instances into several subswarms. As mentioned earlier, we apply the concept of information granulation to decompose the data group. In order to identify the selected data in each decomposed group, we use the information granules (membership degrees) values to identify the index of the instances in each group. Here, we employ a winner-takes-all scheme to determine a single group for each granule, that is, the index of the instances in each of the decomposition group related to the information granule that gets the highest degrees of activation. We denote the set of data associated with the th granules by : where is the decomposition groups, is the information granules for each data, is the data (instances), and are the number of data and the level of information granulation, respectively.

347157.fig.005
Figure 5: The particle scheme of cooperative PSO with more subswarms.
4.3. A Category of Fuzzy-Rule-Based Models

To make an overall presentation more focused, we consider a class of fuzzy-rule-based models governed by the collection of “” rules: where ( is the number of clusters), are the information granules formed in the input space, and is a local linear function with some parameters associated with the corresponding information granule. The information granules are constructed by means of fuzzy clustering, namely, fuzzy C-means (FCMs). The corresponding membership functions are thus described as where , are the prototypes formed through clustering, and , is a fuzzification coefficient.

5. Experimental Studies

In this section, we report our results from a set of experiments, using several machine learning data sets (see http://www.ics.uci.edu/~mlearn/MLRepository.html and http://lib.stat.cmu.edu/datasets/). The main objective of these experiments is to show the abilities of the proposed approach, quantify the performance of the selected subsets of features and instances, and arrive at some general conclusions. A concise summary of the data sets used in the experiment is presented in Table 2. All the data concern continuous output.

tab2
Table 2: Description of data used in the experiments.
5.1. Parameter Setup

The values of the PSO and CPSO parameters were set using the standard form as follows. The values of the inertia weight, , were linearly from 1 to 0 over the course of optimization. The values of the cognitive factor, , and social factor , were set to 0.5 and 1.5, respectively. In Table 3, we also list the numeric values of the parameters of the PSO and CPSO environment. As to the size of the population and the number of generations, we used a larger population and a larger number of generations in the generic version of the PSO than in the CPSO because of the larger search space this algorithm operates in.

tab3
Table 3: The values of the parameters used in the experiments; CPSO1: swarms located in the feature space. CPSO2: swarms located in the instance (data) space.

The number of subswarms used in the optimization method is also shown in Table 3. The PSO method comprises only a single swarm whose individuals concatenate features and instances. In contrast, for the CPSO, we divided the search space into several subswarms that can cooperate with each other and where the individuals in the subswarms are used to represent a portion of the search space. The CPSO1 contains two subswarms that cover the data and features, respectively. In CPSO2, we used three subswarms to represent data point; in the data used here, the number of data is larger than the number of features, so a better balance of the dimensionality of the spaces is achieved. The data (instances) search space is divided into three subswarms, and the decomposition process is realized by running fuzzy clustering (each cluster forms a subswarm). In the table we used a smaller size of generation compared to particles size. This is because in [34] Shi and Eberhart mentioned that the population size does not exhibit any significant impact on the performance of the PSO method. However, the size of particles is high given the size of the search space. Here we require more particles to capture the large search space of instances selection for using the standard PSO. As a result we can find the best solution faster than using a smaller particles size. On the other hand, the number of particle is decreased when we implement the CPSO method. This is because the original large search space is divided into several groups, and the processes of searching the best subset are done in parallel.

5.2. Results of the Experiments

In the experiments, we looked at the performance—an average root mean squared error (RMSE)—obtained for the selected combinations of the number of features and data (instances). The results obtained for the Housing data, PM10 data, and Parkinson’s data for and clusters are summarized in Tables 4, 5, and 6, respectively. The experiments were repeated 10 times, and the reported results are the average RMSE values. We also report the values of the standard deviation of the performance index to offer a better insight into the variability of the performance. It is noticeable that the standard deviation is reduced with the increase of the data involved and the decrease of the dimensionality of the feature space.

tab4
Table 4: Results for housing data; the number of clusters is set to 4, ; κ is the ratio of the number of selected data versus the number of selected features.
tab5
Table 5: Results for PM10 dataset; .
tab6
Table 6: Results for Parkinson’s data; .

The visualization of the results in the form of a series of heat maps, see Figures 6, 7, and 8, helps us arrive at a number of qualitative observations as well as to look at some quantitative relationships. In most cases, the performance index remains relatively low in some regions of the heat map. This finding demonstrates that the available data come with some evident redundancy, which exhibits a negative impact on the designed model. For the PM10 data, there is a significantly reduced performance of the model when, for a low percentage of data, the number of features starts growing. This effect is present for different numbers of clusters. The same tendency is noticeable for the other data sets. There is a sound explanation to this phenomenon: simply, the structure formed by fuzzy clustering does not fully reflect the dependencies in the data (due to the effect of the sparsity of the data), and this problem, in turn, results in the deteriorating performance of the fuzzy model. In this case, one would be better off to consider a suitable reduced set of features. In all cases experimented with, we noted an optimal combination of features and data that led to the best performance of the model. Table 7 summarizes the optimal combinations of features and data.

tab7
Table 7: The optimal % of features and data for different clusters.
fig6
Figure 6: Heat map for PM 10 data for varying in-between 3 to 6.
fig7
Figure 7: Heat map for Body fat data for varying in-between 3 to 6.
fig8
Figure 8: Heat map for housing data for , 4, 5, and 7.

The relationships between the percentage of data used and the resulting RMSE values are displayed in Figures 9 and 10. Some interesting tendencies are worth noting. A critical number of data are required to form a fuzzy model. Increasing the number of data does not produce any improvement as the curves plotted on Figures 9(a), 9(c), and 9(a) achieve a plateau or even some increase of the RMSE is noticeable.

fig9
Figure 9: The values of RMSE versus the percentage of data for selected number clusters: (a) housing data, (b) PM10 data, and (c) body fat data.
fig10
Figure 10: Plots of RMSE versus the percentage of features for selected number clusters: (a) housing data, (b) PM10 data, and (c) body Fat data.

Considering a fixed percentage of the data used, we look at the nature of the feature sets. Tables 8, 9 and 10 displays the best feature for PM10 data, Body fat data, and Housing data, respectively. Overall, the selected subsets of features are almost the same for different numbers of the clusters being used. Furthermore, we observe that in most cases, the reduced feature spaces exhibit an interesting “nesting” property, meaning that the extended feature space constructed subsumes the one formed previously. For example, for the Housing data, we obtain the following subsets of features: Here, the corresponding features are as follows: 6: average number of rooms per dwelling, 9: index of accessibility to radial highways, 13: percentage of lower status population, and 10: full-value property-tax rate per $10,000. This combination is quite convincing.

tab8
Table 8: Best subsets of features for PM10 data.
tab9
Table 9: Best subsets of features for body fat data.
tab10
Table 10: Best subsets of features for housing data.

For the PM10 data, we arrive at a series of nested collections of features: where the corresponding features include: 1: the concentration of PM10 (particles), 7: hour of experiment per day, 6: wind direction, and 2: the number of cars per hour.

Turning to the comparative analysis of performance of the swarm optimization methods, we summarize the obtained results in Figure 11. For all data, the CPSO performed better than the standard PSO. Although both algorithms show the same tendency when the percentage of feature is 100% however, the RMSE produced by the CPSO is lower than the one obtained when running the PSO. Furthermore, the CPSO algorithm is more stable than the standard PSO. In most cases, the standard deviations of error produced by the CPSO are smaller than the results obtained for the standard PSO (see Table 11).

tab11
Table 11: Standard deviations for PSO and CPSO (housing and PM10 data sets).
fig11
Figure 11: Values of RMSE versus the percentage of features selected when running PSO and CPSO—the use of the housing dataset: (a) 20% of selected data, (b) 30% of selected data, (c) 50% of selected data, and (d) 70% of selected data.

Figure 12 shows the subsets of the features selected for different percentages of the features used in construction of the fuzzy model. The CPSO algorithm is more consistent while selecting the increasing number of features. For example, features 6 and 13 were selected when using both 30% and 70% of data. In contrast to the selection made with the PSO algorithm, the subset of the features selected here is not as stable, especially when using only 30% of data.

fig12
Figure 12: Comparison of sets of features being selected by using PSO and CPSO2 for Housing data dataset: (a) PSO method with 30% of selected data, (b) CPSO2 method with 30% of selected data, (c) PSO method with 70% of selected data, and (d) CPSO2 method with 70% of selected data.

Table 12 presents the percentage of the improvement when using the CPSO algorithm compared to the PSO algorithm. Note that in this percentage we included all different combinations of the features’ percentages and the data percentages being used. The percentage of the improvement is higher when dealing with a smaller percentage of features and data. For example, the percentage of improvement is 34% for 10% of the instances and 50% of the features selected while the percentage of improvement is less than 10% for 60% of instances and features used. These results occurred because the PSO method has to deal with a large search space for selecting a small subset of features and instances. In contrast to the search space for CPSO, the large search space is decomposed into multiple subswarms that reduce the dimensionality of the original search space.

tab12
Table 12: Percentage of improvement of the RMSE obtained when using CPSO over the results formed by the PSO; Housing data set.

Tables 13, 14, 15, and 16 show the comparison of RMSE when using the proposed method and the standard fuzzy modeling method. Here the standard fuzzy model is constructed without using any feature and instances selection and the holdout method is used to select the data based on the percentage given. The experiment for using the standard fuzzy modeling is repeated for 50 times. If we analyze the tables, we can observe that our proposed method outperforms the standard method of constructing the fuzzy model from the dataset. This can be seen clearly when using the CPSO method to search for the best subset of feature and instances. For example, in Table 13 if we use the CPSO method, the RMSE for using 70% of data is 3.413, whereas the RMSE for the standard method is 8.312. The same tendency occurs for all datasets used here.

tab13
Table 13: The comparison of RMSE obtained when using standard PSO, CPSO, and standard fuzzy model with holdout method for housing data with .
tab14
Table 14: The comparison of RMSE obtained when using standard PSO, CPSO, and standard fuzzy model with holdout method for body fat data with .
tab15
Table 15: The comparison of RMSE obtained when using standard PSO, CPSO, and standard fuzzy model with holdout method for PM10 data with .
tab16
Table 16: The comparison of RMSE obtained when using CPSO and standard fuzzy model with holdout method for computer data with .

Figure 13 shows the comparison plot between the proposed method and the “standard” fuzzy modeling. In most of the cases, the proposed method showed better performance.

fig13
Figure 13: Comparison of RMSE by using proposed method (straight line) and standard fuzzy model (dotted line): (a) Housing dataset with , (b) body fat dataset with , (c) Parkinson’s dataset with , and (d) computer dataset with .

It becomes clear that one is able to reduce the input data in terms of the number features and instances. Moreover, the flexibility of choosing the reduction level helps the user focus on the most essential subsets of data and features (variables). The knowledge acquired about the best subset of data can be used for future data collection. In addition, the user can put more effort analyzing only the best subset of data that give more impact to the overall prediction.

6. Conclusions

In this paper, we proposed a simple framework for constructing fuzzy modeling from high-dimensional and large data. This framework has several advantages that make it better suited than other frameworks for sharing various real-life problems. Firstly, the simultaneous feature and instances selection is easily adapted to construct the structure of the fuzzy model. Secondly, the best selected subset of data obtained with this framework is capable of representing the original large data set. Thirdly, we construct an optimal (or suboptimal) collection of features and data based on the PSO. In addition, a cooperative PSO is developed in order to overcome the limitation of using standard PSO when dealing with a high-dimensional search space. The size of the selected features and data used to construct the fuzzy model can be adjusted based upon the feedback provided in terms of the performance of the model constructed for the currently accepted.

The effectiveness of the framework was validated by using four well-known regression data sets. The experiment results showed that the proposed fuzzy modeling framework is able to handle high dimensionality and a large data set simultaneously. Moreover, the curse of dimensionality problem in fuzzy modeling was substantially reduced.

In the future work one could concentrate on improving the cooperative PSO by fine-tuning the parameters of the method such as, for example, the cognitive and social parameter.

Acknowledgments

Support from the Ministry of Higher Education (MOHE) Malaysia and Universiti Teknikal Malaysia Melaka (UTeM) is gratefully acknowledged.

References

  1. W. Pedrycz and F. Gomide, Fuzzy Systems Engineering, John Wiley & Sons, Hoboken, NJ, USA, 2007.
  2. G. Castellano, C. Castiello, A. M. Fanelli, and C. Mencar, “Knowledge discovery by a neuro-fuzzy modeling framework,” Fuzzy Sets and Systems, vol. 149, no. 1, pp. 187–207, 2005. View at Publisher · View at Google Scholar · View at Scopus
  3. Y. Jin, “Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 2, pp. 212–221, 2000. View at Publisher · View at Google Scholar · View at Scopus
  4. Q. Zhang and M. Mahfouf, “A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: a special application for the prediction of mechanical properties of alloy steels,” Applied Soft Computing Journal, vol. 11, no. 2, pp. 2419–2443, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. G. E. Tsekouras, “On the use of the weighted fuzzy c-means in fuzzy modeling,” Advances in Engineering Software, vol. 36, no. 5, pp. 287–300, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. A. G. Di Nuovo, M. Palesi, and V. Catania, “Multi-objective evolutionary fuzzy clustering for high-dimensional problems,” in Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–6, July 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Setnes, R. Babuška, U. Kaymak, and H. R. Van Nauta Lemke, “Similarity measures in fuzzy rule base simplification,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 28, no. 3, pp. 376–386, 1998. View at Scopus
  8. M. Y. Chen and D. A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models,” Fuzzy Sets and Systems, vol. 142, no. 2, pp. 243–265, 2004. View at Publisher · View at Google Scholar · View at Scopus
  9. H. Wang, S. Kwong, Y. Jin, W. Wei, and K. F. Man, “Multi-objective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction,” Fuzzy Sets and Systems, vol. 149, no. 1, pp. 149–186, 2005. View at Publisher · View at Google Scholar · View at Scopus
  10. F. J. Berlanga, A. J. Rivera, M. J. del Jesus, and F. Herrera, “GP-COACH: genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems,” Information Sciences, vol. 180, no. 8, pp. 1183–1200, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Alcalá-Fdéz, R. Alcalá, and F. Herrera, “A fuzzy associative classification system with genetic rule selection for high-dimensional problems,” in Proceedings of the 4th International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS '10), pp. 33–38, March 2010. View at Publisher · View at Google Scholar · View at Scopus
  12. Y. Chen, B. Yang, A. Abraham, and L. Peng, “Automatic design of hierarchical Takagi-Sugeno type fuzzy systems using evolutionary algorithms,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 3, pp. 385–397, 2007. View at Publisher · View at Google Scholar · View at Scopus
  13. M. R. Delgado, F. V. Zuben, and F. Gomide, “Coevolutionary genetic fuzzy systems: a hierarchical collaborative approach,” Fuzzy Sets and Systems, vol. 141, no. 1, pp. 89–106, 2004. View at Publisher · View at Google Scholar · View at Scopus
  14. N. Xiong and L. Litz, “Reduction of fuzzy control rules by means of premise learning—method and case study,” Fuzzy Sets and Systems, vol. 132, no. 2, pp. 217–231, 2002. View at Publisher · View at Google Scholar · View at Scopus
  15. A. E. Gaweda, J. M. Zurada, and R. Setiono, “Input selection in data-driven fuzzy modeling,” in Proceedings of the 10th IEEE International Conference on Fuzzy Systems, vol. 3, pp. 1251–1254, December 2001. View at Scopus
  16. M. L. Hadjili and V. Wertz, “Takagi-Sugeno fuzzy modeling incorporating input variables selection,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, pp. 728–742, 2002. View at Publisher · View at Google Scholar · View at Scopus
  17. R. Šindelář and R. Babuška, “Input selection for nonlinear regression models,” IEEE Transactions on Fuzzy Systems, vol. 12, no. 5, pp. 688–696, 2004. View at Publisher · View at Google Scholar · View at Scopus
  18. M. H. F. Zarandi, I. B. Türkşen, and B. Rezaee, “A systematic approach to fuzzy modeling for rule generation from numerical data,” in Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '04), pp. 768–773, June 2004. View at Scopus
  19. H. Du and N. Zhang, “Application of evolving Takagi-Sugeno fuzzy model to nonlinear system identification,” Applied Soft Computing Journal, vol. 8, no. 1, pp. 676–686, 2008. View at Publisher · View at Google Scholar · View at Scopus
  20. S. N. Ghazavi and T. W. Liao, “Medical data mining by fuzzy modeling with selected features,” Artificial Intelligence in Medicine, vol. 43, no. 3, pp. 195–206, 2008. View at Publisher · View at Google Scholar · View at Scopus
  21. Y. Zhang, X. B. Wu, Z. Y. Xing, and W. L. Hu, “On generating interpretable and precise fuzzy systems based on Pareto multi-objective cooperative co-evolutionary algorithm,” Applied Soft Computing Journal, vol. 11, no. 1, pp. 1284–1294, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. F. Wan, H. Shang, L. X. Wang, and Y. X. Sun, “How to determine the minimum number of fuzzy rules to achieve given accuracy: a computational geometric approach to SISO case,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 199–209, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.
  24. H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005. View at Publisher · View at Google Scholar · View at Scopus
  25. H. Liu and H. Motoda, Computational Methods of Feature Selection, Chapman & Hall/CRC, Boca Raton, Fla USA, 2008.
  26. J. A. Olvera-López, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, and J. Kittler, “A review of instance selection methods,” Artificial Intelligence Review, vol. 34, no. 2, pp. 133–143, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. H. Liu and H. Motoda, Instance Selection and Construction for Data Mining, Kluwer Academic Publishers, Boston, Mass USA, 2001.
  28. H. Ishibuchi, T. Nakashima, and M. Nii, “Genetic-Algorithm-Based instance and feature selection,” in Instance Selection and Construction for Data Mining, H. Lui and H. Motoda, Eds., pp. 95–112, Kluwer Academic Publishers, Boston, Mass, USA, 2001.
  29. J. Derrac, S. García, and F. Herrera, “IFS-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule,” Pattern Recognition, vol. 43, no. 6, pp. 2082–2105, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. A. Hertz and D. Kobler, “Framework for the description of evolutionary algorithms,” European Journal of Operational Research, vol. 126, no. 1, pp. 1–12, 2000. View at Publisher · View at Google Scholar · View at Scopus
  31. J. R. Cano, F. Herrera, and M. Lozano, “Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study,” IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 561–575, 2003. View at Publisher · View at Google Scholar · View at Scopus
  32. J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, pp. 1942–1948, Perth, Australia, December 1995. View at Scopus
  33. F. van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle swarm optimization,” IEEE Transactions on Evolutionary Computation, vol. 8, no. 3, pp. 225–239, 2004.
  34. Y. Shi and R. C. Eberhart, “Empirical study of particle Swarm Optimization,” Congress on Evolutionary Computing, vol. 3, pp. 1945–1950, 1999.