Feature Selection in Decision Systems: A Mean-Variance Approach

Yang, Chengdong; Zhang, Wenyin; Zou, Jilin; Hu, Shunbo; Qiu, Jianlong

doi:https://doi.org/10.1155/2013/268063

Mathematical Problems in Engineering

On this page

Abstract Introduction Preliminaries Analysis Conclusion Acknowledgments References Copyright Related Articles

Special Issue

Distributed Control and Estimation of Networked Agent Systems

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 268063 | https://doi.org/10.1155/2013/268063

Feature Selection in Decision Systems: A Mean-Variance Approach

Chengdong Yang,¹Wenyin Zhang,¹Jilin Zou,²Shunbo Hu,¹and Jianlong Qiu³

Academic Editor: Guanghui Wen

Received06 Apr 2013

Accepted24 Apr 2013

Published26 May 2013

Abstract

Uncertainty measure is an important implement for characterizing the degree of uncertainty. It has been extensively applied in pattern recognition and data clustering. Because of instability of traditional uncertainty measures, mean-variance measure (MVM) is utilized to perform feature selection, which could depress disturbances and noises effectively. Thereby, a novel evaluation function based on MVM is designed. The forward greedy search algorithm (FGSA) with the proposed evaluation function is exploited to perform feature selection. Experiment analysis shows the validity and effectiveness of MVM.

1. Introduction

Rough sets, originated by Pawlak [1] in 1980s, is a powerful mathematical tool to deal with inexact, uncertain, and vague knowledge in information systems. It has been drawing extensive attention in theory and applications in artificial intelligence, pattern recognition, data mining, intelligence information processing, decision support, image processing, feature selection, neural computing, conflict analysis, and knowledge discovery [2–10].

Uncertainty measure is an important implement for characterizing the degree of uncertainty in rough set theory. It has been extensively applied in pattern recognition and data clustering. However, this paper reveals the issue that classical uncertainty measures are sensitive to disturbances or noises. Therefore, a novel uncertainty measure, called mean-variance measure (MVM), was proposed to characterize the degree of uncertainty of rough sets in paper [11]. Since it takes fully information in the boundary region into account, MVM is more robust and effective than classical uncertainty measures in depressing disturbances and noises.

As an important application of rough sets in artificial intelligence and machine learning, feature selection or attribute reduction in information systems has been drawing wide attention. due to the fact that excessive features or attributes usually confuse learning algorithms, cause significant slowdowns in learning processes, and increase risks of learned classifiers to over-fit the training data [4, 12].

Unfortunately, it has been proved that finding all reducts or finding an optimal reduct (a reduct with the least number of attributes) is an NP-complete problem [13]. Many researchers devote themselves to finding an efficient reduct by optimization techniques. The forward greedy search algorithm (FGSA), also called hill-climbing algorithm or greedy algorithm, is such an optimal technique for finding one reduct quickly and has been extensively investigated [14–16]. A key ingredient of FGSA lies in establishing an evaluation function to examine importance of each feature or attribute in databases. The evaluation function induced by the classical uncertainty measure, that is, the Pawlak’s roughness or dependency, has been successfully applied in rough sets based feature selection [17, 18]. Along with the development of rough sets, attribute reduction has been studied extensively in the past decade, such as fuzzy rough sets based attribute reduction [19–23], neighborhood rough sets based attribute reduction [24], cross-entropy based attribute reduction [25], tolerance rough sets based attribute reduction [26], cost based attribute reduction [27, 28], and dynamic attribute reduction [29, 30], and extended rough set based attribute reduction [31], cover rough sets based attribute reduction [14, 32], covering generalized rough sets based attribute reduction [33], variable precision rough sets based attribute reduction [34]. Nevertheless, the classical uncertainty measure is not robust and maybe fluctuates largely only with minor disturbances. Even a little change in information systems may produce an unpredictable fluctuation of this uncertainty measure.

The mean value and variance in probability theory, able to be used to analyze preciously data, have been widely discussed in portfolio optimization and portfolio selection [35–38]. They are considered as an arbitrator which is used to determine whether a group of data is robust and stable. For example, two shooters obtain the same score (mean value). If one has to be chosen to take part in a tournament, which one should be chosen reasonably? Apparently, the one with a less variance score would like to be chosen. In this paper, the notions of mean value and variance are introduced into information systems as an arbitrator to evaluate the uncertainty degree. A novel uncertainty measure, called mean-variance measure (MVM), is proposed. MVM firstly calculates the mean of every object, and then all objects’ variances are taken into account. The effect caused by disturbances of data in decision systems on MVM will decrease, since a tiny alteration of values will not result in a large change of variance.

Based on the new notion of MVM, an evaluation function called D-MVM in decision systems is further designed. The designed evaluation function takes full information in positive region and boundary region into account.

This paper is organized as follows. Some elementary concepts on rough sets and MVM are reviewed in Section 2. Section 3 investigates the issue on feature selection in decision systems by MVM. Experimental results and analysis are given in Section 4, and Section 5 concludes this paper.

2. Preliminaries

2.1. Rough Sets

This section briefly outlines some basic notions on rough sets.

Definition 1. An information system is a pair satisfying (1)is a nonempty finite set of objects; (2) is a nonempty finite set of attributes;(3)for every , there is a mapping , where is the set of values.

Definition 2. Given an information system and , an indiscernibility relation on is defined by

Obviously, is an equivalent relation induced by the attribute set . is referred to as the equivalence class of with respect to . A partition of induced by the equivalent relation can be denoted by where is some equivalence class of in , . and are, respectively, denoted by and , for short, when no ambiguity arises in this paper.

Definition 3. Given an information system , , and , the lower approximation and the upper approximation of with respect to are defined, respectively, by

Definition 4. Given an information system , a partial ordering relation in the family is defined as if and only if for any , there exists a such that , where and are partitions induced by , respectively.
is said to be coarser than , or is finer than , if. is said to be strictly finer than , denoted by , if but .

Proposition 5. Given an information system , , and if , then .

From Proposition 5, the more attributes an information system contains, the finer the corresponding partition is. Therefore, is the finest one among partitions induced by all subsets of .

The classical uncertainty measure is defined as follows.

Definition 6. Given an information system or an incomplete information system , , and , the roughness of is defined as

The quantity characterizes the uncertainty degree of with respect to . When , is said to be definable; otherwise, it is said to be rough.

When is divided into two nonempty sets and such that , then , denoted by , is called a decision system, is called the conditional attribute set, and is called the decision attribute set.

Definition 7. Given a decision system , the dependency degree of on is defined by where is the positive region of with respect to and denotes the cardinality of .

Definition 8. Given a decision system and , is independent if

Attribute reduction in decision systems is defined as follows.

Definition 9. Given a decision system and , is called a reduct if (1),(2) for any .

2.2. A Novel Uncertainty Measure of Rough Sets

Given an information system and , the characteristic function of on can be denoted by where .

Let , then can be considered as a special fuzzy set derived from on .

In rough set theory, objects in the same equivalent class cannot be distinguished for each other, since they have the same characteristic. However, in the boundary region of a rough set, objects in the same class have different characteristics. In this case, their mean value of objects in a class is generally used to characterize each object.

Definition 10. Given an information system , , , and , the mean value of in , denoted by , is defined by

We denote by . It is evident that is a fuzzy set on . As an example, given , , and , seen in Figure 1(a), then and are calculated by (8) and (9), respectively, as shown in Figures 1(b) and 1(c).

(a)

(b)

(c)

As mentioned above, when an object is not in , its mean value is non-zero if and only if its equivalent class has non-empty intersection with ; when is in , its mean value is 1 if and only if its its equivalent class is contained in . From Definition 10, it is easy to verify that the mean value is an inclusion degree of being included in .

Proposition 11. Given an information system , , , and , the following conclusions hold:(1)if , then ; (2)if , then ; (3)if , then .

Note that when is in the positive region and the negative region. It is obvious that only when is in the boundary region.

Definition 12. Given an information system , , and , the mean-variance uncertainty measure (MVM) of with respective to , denoted by , is defined as

It is clear that

Assume when or . From Definition 12 one can see that only objects in the boundary region of contribute to the value of . In this sense, takes fully information in the boundary region into account. Therefore, it is a proper measure to evaluate the uncertainty of .

Definition 13. Given an information system , , and , (1) is said to be -definable if ; (2) is said to be -rough if ; (3) is said to be coarser with respect to than with respect to if , in which case, is called finer with respect to than with respect to .

Next, we investigate properties of and show its efficiencies in evaluating uncertainty of a set in information systems.

Proposition 14. Given an information system , , and , the following conclusions hold:(1);(2)if , then , where is the finest partition of ; (3)if , then , where is the coarsest partition of .

3. Feature Selection in Decision Systems

In this section, the proposed uncertainty measure is further investigated to perform feature selection in decision systems.

Definition 15. Given a decision system and , MVM of the decision attribute set with respect to the conditional attribute subset , called D-MVM, or an evaluation function, is defined by where is the number of the decision classes induced by the decision attribute set , , , reflect the uncertainty measure of each decision class, and describes the integrated uncertainty degree of blocks .

In the following, some properties of are studied.

Proposition 16. Given a decision system and , the following conclusions hold:(1);(2)if , then ; (3)if , then .

Proof. The proof is analogous to that of ([11], property 2.14).

Definition 17. Given a decision system and , is independent if

By D-MVM, a relative reduct can be defined as follows.

Definition 18. Given a decision system and , is a relative reduct of with respective to if and only if(1),(2) for any .

A relative reduct is a minimal subset which has the same discriminating power as the raw decision systems.

Definition 19 (significance based on D-MVM). Given a decision system , , and a feature , the significance of is defined as

Notice that if is an empty set, , and is a nonnegative real number; otherwise, .

With the proposed evaluation function, a forward greedy search algorithm for feature selection can be designed as follows.

In the first iteration, we start with an empty set specified with . The quantity is negative in every iteration except the first one. The rest features in each iteration are all evaluated, and the one with the minimal significance will be chosen. The algorithm does not stop until adding any of the rest features to selected feature set will not bring a change larger than threshold in Algorithm 1, where controls the precision of the algorithm.

Forward Greedy Search Algorithm of Feature Selection based on
Mean-Variable in Decision Systems (FGSA-MVM):
Input: (, ∪ , , ),
Output:
(1)
(2) while −
(3) for each −
(4) compute (, , ) ()
(5) end for
(6) select the attribute such that
(7) (, , ) = (, , )
(8) if
(9) ∪ →
(10) else
(11) break
(12) end if
(13) end while
(14) return

There is no doubt that FGSA-MVM is for the sake of searching a subset of conditional attributes with minimal positive real D-MVM. We obverse step 7 of FGSA-MVM. In the first iteration, we choose the minimal significance because is a positive number. In the rest iterations, we also select the minimal significance with the biggest step length since is nonpositive for any .

4. Experiments and Analysis

In order to test the validity of the proposed method for feature selection, comparative experiments have been implemented in efficiency and convergence of proposed algorithm with two of the most important methods, feature selection based on dependence [39] and mutual information [40].

As shown in Table 1, four standard data sets, cited from the machine learning data repository, University of California, Irvine, CA, USA [41], are employed in our experiments.

CART and RBF-support vector machine (SVM) learning algorithms are introduced to test the classification performances of feature selection for raw sets and for selected feature sets. As a widely used technique to evaluate classification performances in machine learning, 10-fold cross-validation [42] is carried out in our experiments by dividing the samples into 10 subsets. Nine of them are used as training set, and the rest one is used as the test set. After 10 rounds, the average value and variation are computed as the final classification performance.

Classification performances are evaluated by CART in Table 2 and by RBF-SVM in Table 3. “Hold" marks the highest classification performances among these obtained by the methods based on three uncertainty measures. The number of selected features with the highest classification performance by the new measure is larger than that by dependency and by MI. It is 12, 4, and 12, respectively, via CART algorithm, whereas it is 14, 8, and 11 via SVM algorithm.

From the experiments one can see that the proposed measure outperforms not only in the smallest average number of selected features in reducts but also in the highest classification performance in feature selection.

In the remainder of this section, we pay attention to the convergence of the proposed method. Figure 2 shows the fluctuations of evaluation functions with respect to the number of selected features. The significance of selected features is calculated based on dependency, on MI, and on MVM, respectively. The four data sets are used to show the convergence of different techniques. The selected orders of the four data sets based on different evaluation functions are shown in Table 4, in which the sequences of selected features are different, even the number of selected features in the optimal reducts may be the same. As a whole, significance degrees based on dependency and MI increase, while significance based on MVM decreases. With MVM, all four evaluation functions decrease fast at the beginning of the selection process. The evaluation function of credit data slowly decreases, and this result constitutes a different pattern of behavior compared with the three other data sets. Feature selection algorithms may stop very early if we specify a threshold to stop the search in this case. The convergence and good classification performances are observed in the results.

(a) Dependency

(b) MI

(c) MVM

5. Conclusion

This contribution studied feature selection based on MVM in decision information systems, which is one of the most important applications of rough set theory. A novel approach to feature selection was proposed by introducing an evaluation function based on MVM. Theoretical analysis and experimental results concluded that the performances of proposed method are outperformed by dependency and by MI not only in the number of selected features but also in the classification precision.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant 61273012 and 61102040, and the Shandong Province College Science and Technology Plan Project under Grant J12LN91, and the Shandong Natural Science Foundation of China under Grant ZR2011FL014, and the Shandong Province Science and Technology Development Plan Projects under Grant 2012YD01052.

References

Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
P. Fortemps, S. Greco, and R. Słowiński, “Multicriteria decision support using rules that represent rough-graded preference relations,” European Journal of Operational Research, vol. 188, no. 1, pp. 206–223, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
T. Deng, Y. Chen, W. Xu, and Q. Dai, “A novel approach to fuzzy rough sets based on a fuzzy covering,” Information Sciences, vol. 177, no. 11, pp. 2308–2326, 2007.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
W. Z. Wu, “Attribute reduction based on evidence theory in incomplete decision systems,” Information Sciences, vol. 178, no. 5, pp. 1355–1371, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. C. Lee, “Using support vector machine with a hybrid feature selection method to the stock trend prediction,” Expert Systems with Applications, vol. 36, no. 8, pp. 10896–10904, 2009.
View at: Publisher Site | Google Scholar
Z. Pawlak, “Some remarks on conflict analysis,” European Journal of Operational Research, vol. 166, no. 3, pp. 649–654, 2005.
View at: Publisher Site | Google Scholar
H. Xiumei, F. Haiyan, and S. Kaiquan, “S-rough sets and the discovery of F-hiding knowledge,” Journal of Systems Engineering and Electronics, vol. 19, no. 6, pp. 1171–1177, 2008.
View at: Publisher Site | Google Scholar
C. Wang, C. Wu, D. Chen, Q. Hu, and C. Wu, “Communicating between information systems,” Information Sciences, vol. 178, no. 16, pp. 3228–3239, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
D. Chen, Y. Yang, and X. Zhang, “Parameterized local reduction of decision systems,” Journal of Applied Mathematics, vol. 2012, Article ID 857590, 14 pages, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
Md. M. Kabir, Md. Shahjahan, and K. Murase, “A new local search based hybrid genetic algorithm for feature selection,” Neurocomputing, vol. 74, no. 17, pp. 2914–2928, 2011.
View at: Publisher Site | Google Scholar
C. Yang, W. Zhang, J. Zou et al., “A novel uncertainty measure on rough sets: a mean-variance approach,” in Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 900–904, 2012.
View at: Google Scholar
L. Yu and H. Liu, “Efficiently handling feature redundancy in highdimensional data,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03), pp. 685–690, Washington, DC, USA, August 2003.
View at: Google Scholar
S. K. M. Wong and W. Ziarko, “On optimal decision rules in decision tables,” Bulletin of the Polish Academy of Sciences, vol. 33, no. 11-12, pp. 693–696, 1985.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
F. Li and Y. Yin, “Approaches to knowledge reduction of covering decision systems based on information theory,” Information Sciences, vol. 179, no. 11, pp. 1694–1704, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Guan and D. Bell, “Rough computational methods for information systems,” Artificial Intelligence, vol. 105, pp. 77–103, 1998.
View at: Google Scholar
Z. Geng and Q. Zhu, “Rough set-based heuristic hybrid recognizer and its application in fault diagnosis,” Expert Systems with Applications, vol. 36, no. 2, pp. 2711–2718, 2009.
View at: Publisher Site | Google Scholar
Z. Pawlak, “Rough sets and fuzzy sets,” Fuzzy Sets and Systems in Information Science and Engineering, vol. 17, no. 1, pp. 99–102, 1985.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. Hu and N. Cercone, “Learning in relational databases: a rough set approach,” Computational Intelligence, vol. 11, no. 2, pp. 323–338, 1995.
View at: Google Scholar
Q. Shen and R. Jensen, “Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring,” Pattern Recognition, vol. 39, no. 7, pp. 1351–1363, 2004.
View at: Google Scholar
Q. Hu, Z. Xie, and D. Yu, “Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation,” Pattern Recognition, vol. 40, no. 12, pp. 3509–3521, 2007.
View at: Publisher Site | Google Scholar
R. Jensen and Q. Shen, “New approaches to fuzzy-rough feature selection,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 4, pp. 824–838, 2009.
View at: Publisher Site | Google Scholar
Q. He, C. Wu, D. Chen, and S. Zhao, “Fuzzy rough set based attribute reduction for information systems with fuzzy decisions,” Knowledge-Based Systems, vol. 24, no. 5, pp. 689–696, 2011.
View at: Publisher Site | Google Scholar
D. Zhang, Y. Wang, and H. Huang, “Rough neural network modeling based on fuzzy rough model and its application to texture classification,” Neurocomputing, vol. 72, no. 10–12, pp. 2433–2443, 2009.
View at: Publisher Site | Google Scholar
Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set based heterogeneous feature subset selection,” Information Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Zheng and R. Yan, “Attribute reduction based on cross entropy in rough set theory,” Journal of Information & Computational Science, vol. 9, no. 3, pp. 745–750, 2012.
View at: Google Scholar
N. M. Parthaláin and Q. Shen, “Exploring the boundary region of tolerance rough sets for feature selection,” Pattern Recognition, vol. 42, no. 5, pp. 655–667, 2009.
View at: Publisher Site | Google Scholar
X. Jia, W. Liao, Z. Tang, and L. Shang, “Minimum cost attribute reduction in decision-theoretic rough set models,” Information Sciences, vol. 219, pp. 151–167, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive attribute reduction,” Information Sciences, vol. 181, no. 22, pp. 4928–44942, 2011.
View at: Google Scholar
F. Wang, J. Liang, and C. Dang, “Attribute reduction for dynamic data sets,” Applied Soft Computing, vol. 13, no. 1, pp. 676–6689, 2013.
View at: Google Scholar
J. Zhang, T. Li, D. Ruan, and D. Liu, “Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems,” International Journal of Approximate Reasoning, vol. 53, no. 4, pp. 620–635, 2012.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Meng and Z. Shi, “Extended rough set-based attribute reduction in inconsistent incomplete decision systems,” Information Sciences, vol. 204, pp. 44–69, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
W. Zhu and F. Y. Wang, “Reduction and axiomization of covering generalized rough sets,” Information Sciences, vol. 152, pp. 217–230, 2003.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
E. C. C. Tsang, C. Degang, and D. S. Yeung, “Approximations and reducts with covering generalized rough sets,” Computers & Mathematics with Applications, vol. 56, no. 1, pp. 279–289, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. S. Mi, W. Z. Wu, and W. X. Zhang, “Approaches to knowledge reduction based on variable precision rough set model,” Information Sciences, vol. 159, no. 3-4, pp. 255–272, 2004.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. T. Deng, Z. F. Li, and S. Y. Wang, “A minimax portfolio selection strategy with equilibrium,” European Journal of Operational Research, vol. 166, no. 1, pp. 278–292, 2005.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. Huang, “Mean-variance model for fuzzy capital budgeting,” Computers and Industrial Engineering, vol. 55, no. 1, pp. 34–47, 2008.
View at: Publisher Site | Google Scholar
P. Gupta, M. K. Mehlawat, and A. Saxena, “Asset portfolio optimization using fuzzy mathematical programming,” Information Sciences, vol. 178, no. 6, pp. 1734–1755, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. Y. Zhou and G. Yin, “Markowitz's mean-variance portfolio selection with regime switching: a continuous-time model,” SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1466–1482, 2003.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. Modrzejewski, “Feature selection using rough sets theory,” in Proceedings of the European Conference on Machine Learning, pp. 213–226, Vienna, Austria, 1993.
View at: Google Scholar
F. F. Xu, D. Q. Miao, and L. Wei, “Fuzzy-rough attribute reduction via mutual information with an application to cancer classification,” Computers and Mathematics with Applications, vol. 57, no. 6, pp. 1010–1017, 2009.
View at: Publisher Site | Google Scholar
D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz, “UCI Repository of Machine Learning Databases,” University of California, Department of Information and Computer Science, 1998, http://mlearn.ics.uci.edu/MLRepository.html.
View at: Google Scholar
M. van der Gaag, T. Hoffman, M. Remijsen et al., “The five-factor model of the positive and negative syndrome scale II: a ten-fold cross-validation of a revised model,” Schizophrenia Research, vol. 85, no. 1–3, pp. 280–287, 2006.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2013 Chengdong Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2445

Downloads

1187

Citations