Computational Intelligence and Neuroscience

Volume 2017 (2017), Article ID 6573623, 7 pages

https://doi.org/10.1155/2017/6573623

## A Novel Strategy for Minimum Attribute Reduction Based on Rough Set Theory and Fish Swarm Algorithm

^{1}School of Information Science and Technology, Southwest Jiao Tong University, Chengdu 610031, China^{2}School of Mathematics and Statistics, Sichuan University of Science & Engineering, Zigong 643000, China

Correspondence should be addressed to Yuebin Su

Received 28 March 2017; Revised 25 June 2017; Accepted 5 July 2017; Published 15 August 2017

Academic Editor: Naveed Ejaz

Copyright © 2017 Yuebin Su and Jin Guo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

For data mining, reducing the unnecessary redundant attributes which was known as attribute reduction (AR), in particular, reducts with minimal cardinality, is an important preprocessing step. In the paper, by a coding method of combination subset of attributes set, a novel search strategy for minimal attribute reduction based on rough set theory (RST) and fish swarm algorithm (FSA) is proposed. The method identifies the core attributes by discernibility matrix firstly and all the subsets of noncore attribute sets with the same cardinality were encoded into integers as the individuals of FSA. Then, the evolutionary direction of the individual is limited to a certain extent by the coding method. The fitness function of an individual is defined based on the attribute dependency of RST, and FSA was used to find the optimal set of reducts. In each loop, if the maximum attribute dependency and the attribute dependency of condition attribute set are equal, then the algorithm terminates, otherwise adding a single attribute to the next loop. Some well-known datasets from UCI were selected to verify this method. The experimental results show that the proposed method searches the minimal attribute reduction set effectively and it has the excellent global search ability.

#### 1. Introduction

Data mining, which was known as knowledge discovery in database, includes extracting knowledge, discovering new patterns, and predicting the future trends from the amounts of data. Nowadays, with an increasing number of applications in different fields, massive volumes of very high-dimensional data were produced; the data mining faces the great challenge. As known to all, much of datasets contain unnecessary redundant attributes, which not only occupy extensive computing resources but also seriously impact the decision-making process. Reducing the unnecessary redundant attributes becomes very necessary for data mining [1]. Attribute reduction (AR) in the rough set theory (RST) removes redundant or insignificant knowledge with keeping the classification ability of the information system the same as before. It was proposed by Pawlak and Sowinski [2]. Now, RST is widely used in many fields such as machine learning, data mining, and knowledge discovery [3–6].

AR is one of the core problems in RST. In particular, minimal reduction problem is an important part of AR in RST, in which the cardinality of attribute subset is the smallest among all possible reductions. It has been paid much attention by many researchers. One basic solution to find the minimal reducts is to construct a discernibility function and simplify it from the dataset by discernibility matrix [7–9]. Unfortunately, it has been shown that the problem of minimal reduct generation is NP-hard and the run time of generating all reducts is exponential [10]. Recently, because many kinds of NP-hard problems can be solved by heuristic algorithms with increasing computational cost, heuristic attribute reduction algorithm is the main research direction in the field of AR [11].

In general, swarm intelligence algorithm is one kind of heuristic approaches which were used widely for solving attribute reduction problem, including genetic algorithm (GA) [12–14], particle swarm optimization (PSO) [15–18], ant colony optimization (CO) [19, 20], and fish swarm algorithm (FSA) [11, 21, 22]. FSA is a kind of evolutionary algorithm which was inspired by the natural schooling behaviors of fish to generate candidate solutions for optimization problems, such as random, swarming, following, and preying behaviors. It has a strong ability to avoid local minimums in order to achieve a global optimization [23]. Due to its abilities to perform, FSA has received much attention in recent years.

In this paper, a new coding method about the subset of attribute sets is proposed. By the coding method, a novel strategy for minimal attribute reduction algorithm based on FSA and RST is proposed. It firstly identifies the core attributes by discernibility matrix. Based on the core attributes, all subsets without containing the core attribute are encoded into an integer by the proposed coding method and an initial population is generated for FSA used to find the optimal set of reducts. The fitness function of a subset is defined based on the attribute dependency of the formed rough set. In each loop, the evolutionary direction of the individual is limited to a certain extent by the coding method. If the maximum attribute dependency and the attribute dependency of condition attribute set are equal, then the algorithm terminates, otherwise, adding a single attribute to the next loop. Different benchmark datasets are used to compare the numerical results; our proposed method is a robust and cheap method for calling the fitness function.

The rest of the paper is organized as follows. In Section 2, we introduced some basic concepts in rough sets and fish swarm algorithm. In Section 3, we focus the coding method of combination set. In Section 4, a novel attribute reduction algorithm based on fish swarm algorithm and rough set is proposed. In Section 5, some well-known datasets are used to test the performance of the proposed method. Finally, Section 6 concludes the paper and the areas of further research.

#### 2. Background

##### 2.1. Base Notions of Rough Set Theory

In this section, some basic notions and its proposition will be reviewed in the theory of rough set.

A decision table can be represented as , where is a nonempty finite set of objects, , where is a set of condition attribute and is a decision attribute set, is the domains of attributes belonging to , and is a function assigning attribute values to objects in .

For any , there is an associated indiscernibility relation :

Let ; the -lower approximation of is defined as where denotes an equivalence class of determined by object . The notation refers to the -positive region is given by . The -approximation quality with respect to decisions attribute set is defined as follows: and the core attribute set is defined as

##### 2.2. The Principle of FSA

FSA is a new bionic optimization algorithm which simulates the fish swarm behaviors such as preying, swarming, and following behaviors and updates the maximum fitness value on the bulletin board. In FSA, let be the population size, the Artificial Fishes (AF) are generated by random function which is represented by a -dimensional position , and is the updated value of . Food satisfaction of is represented as fitness function value . The Euclidean distance is denoted as the relationship between and . Other parameters include (representing maximum step length), (the visual distances of fish), being a random number in , and (a crowd factor).

Preying behavior is a basic behavior of FSA. As shown in (4), for , we randomly select a random within the current visual scope. If , then move a step from to . Otherwise, move a step to another random that . After a number of trials, if the random that meets is not satisfied, will be replaced with a random position within the visual scope directly. It makes the FSA escape from the local optimal solution. Define the function as (4).

Swarming behavior is described as (5). It shows the attraction of the swarm center to the individual. Let be the number of AFs within the current visual scope of , and is the center position of those neighbors. For the swarm center , if the food satisfaction is greater and not too crowded (i.e., ), then move a step from to . Otherwise, preying behavior is to identify a next position for the current.

Following behavior is described as (6). Let be a AF with the greatest food consistence among AFs in the current visual scope. If the food satisfaction is greater and not too crowded (i.e., , then move a step from to . Otherwise, preying behavior is to identify a next position for the current.

In addition, and parameters play an important role in FSA. They determine the convergence speed of FSA and make it escape from the local optimal solution. They are described as follows [11]:where is the Lorentzian function and is the normal distribution function.

#### 3. A Coding Method for Combination

Let be an integers set which contain elements. The permutations number of is !. Sort them from small to large by lexicographic order. The Cantor expansion and inverse Cantor expansion indicate that there is a one-to-one correspondence between the full permutation set of and . Converting the full permutation into decimal number can be used to solve the TSP problem by the heuristic algorithm. Different from the TSP problem, rough set attribute reduction focuses on the combination of attribute set; it is necessary to discuss the ranking of a combination in the combinations sequence.

Let ; then the cardinal number of is . For and , then . Sort all the elements that, in from small to large, that is,

By lexicographic order, can be regarded as a sequence.

*Example 1. *Let and . All the elements of were shown in Table 1.