Abstract

Rough set theory has been successfully applied to many fields, such as data mining, pattern recognition, and machine learning. Kernel rough sets and neighborhood rough sets are two important models that differ in terms of granulation. The kernel rough sets model, which has fuzziness, is susceptible to noise in the decision system. The neighborhood rough sets model can handle noisy data well but cannot describe the fuzziness of the samples. In this study, we define a novel model called kernel neighborhood rough sets, which integrates the advantages of the neighborhood and kernel models. Moreover, the model is used in the problem of feature selection. The proposed method is tested on the UCI datasets. The results show that our model outperforms classic models.

1. Introduction

Rough set theory, which was proposed by Pawlak in 1982, is a powerful mathematical method to study incomplete and imprecise information. This theory has been successfully applied to many fields, such as data mining, decision-making, pattern recognition, machine learning, and intelligent control [14]. Kernel rough sets [5] and neighborhood rough sets [6] are two important models in rough set theory.

Hu innovatively proposed the kernel rough sets model [5, 7]. A Gaussian kernel rough sets model-based feature selection method was discussed in [8]. The information fusion problem of imperfect images has also been studied based on Hu’s research [9]. Ghosh et al. proposed an efficient Gaussian kernel-based fuzzy rough sets approach for feature selection [10]. A novel fuzzy rough sets model was constructed by combining the hybrid distance and the Gaussian kernel in [11]. A new feature selection method based on kernel fuzzy rough sets and a memetic algorithm were proposed for the transient stability assessment of power systems [12]. In these studies, the information granules are constructed in the kernel structure. The “min” and “max” aggregation operations are used in approximation calculations [1315]. That is, the decision for a sample is dependent on the nearest sample [7]. The computation of the lower approximation comes with risks if there is noise in the datasets [16]. Data noise can lead to an increase in the classification error rate by using the kernel rough sets model [16].

Neighborhood is an important concept in classification and clustering. To formulate the notion of approximation, the neighborhood system was introduced into the relational model by Lin [1719]. Yao presented a framework for the formulation, interpretation, and comparison of neighborhood systems and rough sets approximations [20]. Hu et al. investigated the issue of heterogeneous feature subset selection based on neighborhood rough sets [6, 21]. Based on neighborhood granulation, samples are constructed as a family of neighborhood granules to approximate the object sets. The neighborhood model can handle noisy data well based on the tolerance neighborhood relation and probabilistic theory [22]. However, the main limitation of this model is that it cannot describe the fuzziness of samples [16].

Overall, the kernel rough sets model, which has fuzziness, is susceptible to noise in the decision system. The neighborhood rough sets model can handle noisy data but cannot describe the fuzziness of samples. That is, we can construct a new model by the combination of advantage of kernel and neighborhood rough sets.

On the other hand, increasing amounts of high-dimensional data must be processed for some real applications. Currently, feature selection plays an important role in machine learning and data mining. Neighborhood rough sets and kernel rough sets are widely used in feature selection [2326]. We also can deal with the feature selection problem by using the new rough sets model.

Based on the motivations above, the contributions of this paper include the following: (1) We define a novel model, the kernel neighborhood rough sets model, which integrates the advantages of the neighborhood and kernel models. (2) Moreover, the model is used in the problem of feature selection. (3) The proposed method is tested on the UCI datasets. The results show that our model yields a better performance than classic models.

This paper is organized as follows. In Section 2, some basic concepts regarding neighborhood rough sets and kernel rough sets are briefly reviewed. In Section 3, the kernel neighborhood rough sets (KNRS) model is investigated in detail. Section 4 shows the application of KNRS to feature evaluation and feature selection. Numerical experiments are reported in Section 5. Finally, Section 6 concludes the paper.

2. Preliminary Knowledge

In this section, we review the kernel rough sets (KRS) model [5] and the neighborhood rough sets (NRS) model [6].

2.1. Kernel Rough Sets (KRS) Model

Definition 1. Suppose is a nonempty finite set of objects and is a Gaussian kernel function , where is the Euclidean distance. Therefore, is a kernel approximation space, where (1),(2),(3).

Definition 2. Given a kernel approximation space , is a fuzzy subset of , and we define the lower and upper approximations of on the space as follows:

2.2. Neighborhood Rough Sets (NRS) Model

Pawlak rough sets theory works only on data sets with discrete attributes [27]. Hu et al. introduced a neighborhood rough sets model for heterogeneous data to avoid discretization. The definition is as follows [6, 28].

Definition 3. Suppose is a nonempty finite set of objects and is a given distance function. is a set of features. Therefore, is a neighborhood approximation space, where (1), if and only if , ,(2),(3).

Definition 4. Given a neighborhood approximation space , , and , is a neighborhood of whose center is and the radius is , where .
Here, can be considered to be the neighborhood granule.

Remark 1. Given two points and in -dimensional Euclidean space, the distance between them can be computed as where and .

Definition 5. Given a neighborhood approximation space , for any subset , we define the lower and upper approximations of on the space , respectively, as follows: The definitions of the lower and upper approximations are the most important concepts in KRS and NRS.

3. Kernel Neighborhood Rough Sets (KNRS) Model

In this section, we study the KNRS model. The definitions and theorems of KNRS are discussed in detail. The kernel neighborhood decision system is also investigated.

3.1. Kernel Neighborhood Rough Sets

Definition 6. Given a kernel neighborhood approximation space , , where is a Gaussian kernel function, ; thus, is a kernel neighborhood granule of , where

Definition 7. Given a kernel neighborhood approximation space , , where is a kernel function, for any fuzzy subset ; we define the lower and upper approximations of on the space , respectively, as follows: The method defined above is crisp and has no noise tolerance ability. Here, we propose an improved model that is called variable precision lower and upper approximation.

Definition 8. Given a kernel neighborhood approximation space , , where is a kernel function, for any fuzzy subset ; the variable precision lower and upper approximations of are defined as follows, where denotes the cardinality of the specified set: Then, as in [22].

Example 1. Consider a kernel neighborhood approximation space , where , , , , and is a kernel function. The details are listed in Table 1.

The relation matrix is presented in Table 2. Each line is the kernel neighborhood granule of .

Given a fuzzy set , we obtain .

Definition 9. Given a kernel neighborhood approximation space , where and is a kernel function, for any fuzzy subset , the positive, negative, and boundary regions of in the space are, respectively, expressed as follows:

Theorem 1. Given a kernel neighborhood approximation space , where , then (1),(2),(3).

Proof. If , then . Assuming that , then . That is, if but not vice versa. Similarly, we obtain .

3.2. Kernel Neighborhood Decision System

A kernel neighborhood approximation space is called a variable precision kernel neighborhood decision system which is denoted by . and are condition and decision features, respectively.

Definition 10. Consider a variable precision kernel neighborhood decision system , where , , and is a kernel function. Suppose and . We, respectively, define the lower and upper approximations of with respect to feature subset as follows:

Definition 11. Consider a variable precision kernel neighborhood decision system, , where , , and is a kernel function. Suppose and . The positive, negative, and boundary regions of in the space are, respectively, defined as follows: The size of the boundary increases the uncertainty in the decision system. The samples in the boundary generally have the same condition features but belong to different decision classes. This discrepancy leads to the poor classification performance of the employed algorithms. We should select a proper feature subset that increases the size of the positive region and decreases the size of the boundary region.

Definition 12. Consider a variable precision kernel neighborhood decision system , where , , and is a kernel function. Suppose and . The dependency degree of relative to is defined as follows: The dependency degree reflects the approximating power of a condition feature set. A higher approximation degree means that the samples that are described by feature subset are more consistent with decision .

Theorem 2. Given a variable precision kernel neighborhood decision system , where , then (1),(2),(3).

Proof. If , then . Assuming that , then . That is, if but not vice versa. Similarly, we obtain .

4. Feature Selection Based on KNRS

One of the most important applications of the information rough sets theory is the evaluation of the classification power of the attributes. In this section, we define the significance of the feature subsets. The feature selection algorithm is also discussed.

Definition 13. Consider a variable precision kernel neighborhood decision system , where , , and is a kernel function. Suppose and . The significance of feature in is defined as follows: is used to evaluate the significance of attribute in subset . That is, is an important feature if it increases the dependency degree of subset . As mentioned in Definition 11, we need subset to be more consistent with decision . Thus, we define the feature selection algorithm as follows.

It is impractical to obtain the optimal subset of features from candidates through an exhaustive search, where is the number of features. We use a forward greedy search algorithm, which is usually more efficient than a standard brute-force exhaustive search [4]. That is, one starts with an empty set of attributes and adds features to the subset of selected attributes one by one. Each selected attribute maximizes the increment of significance of the current subset.

Corollary 1. Given a variable precision kernel neighborhood decision system , where , if , then.

Proof. Please refer to the proof of Theorem 2.
Corollary 1 shows that an object must belong to the positive region with respect to the feature sets (such as ) if the object belongs to the positive region with respect to the feature subset (such as , where ). Therefore, it is unnecessary to consider every object when computing the positive region. Then, we can obtain a fast feature selection algorithm by improving Algorithm 1.

Input: decision system and stopping threshold
Output: selected features
1. 
2. while
3.  for each
4.   compute
5.   compute
6.  end for
7. find the maximal significance and the corresponding attribute
8.  if
9.   
10.  else
11.   exit while
12.  end if
13. end while
14. return

5. Experimental Analysis

In this section, we evaluate the effectiveness of KNRS through a series of experiments. The data sets are downloaded from the UCI machine learning repository (http://archive.ics.uci.edu/ml/index.php) as [3] and are described in Table 3. The numerical attributes of the samples are linearly normalized as follows: where and are the bounds of the given attribute. Two popular machine leaning algorithms, namely, CART and RBF SVM, are used to evaluate the quality of the selected features. The experiments are run in a 10-fold cross-validation mode. The parameters of CART and RBF SVM are set to the default values (by using MATLAB toolkit osu_svm3.00).

5.1. Discussion of the Parameters

The neighborhood metric and precision degree are two key parameters that affect the feature selection results. Here, and are analyzed through a series of experiments. We set from 0.05 to 0.95 with a step size of 0.1. We set the precision degree from 0.5 to 0.95 with a step size of 0.05. The evaluation criterion is the classification accuracy in the selected feature subset with parameters and .

According to Figures 15, on most of the data sets, higher precision is achieved in a larger area when is between 0.5 and 0.95 and is between 0.5 and 0.75. That is, the FNRS model is feasible and stable in most cases.

5.2. Effectiveness of the Fast Feature Selection

We propose feature selection based on a kernel neighborhood rough set (Algorithm 1: FSKNRS) and a fast version (Algorithm 2: FFSKNRS). We evaluated the run times of the two methods. The results are listed in Table 4. The fast feature selection algorithm that is based on KNRS yields a better performance than Algorithm 1 is.

Input: decision system and stopping threshold
Output: selected features
1. ,
2. while
3.  for each
4.   for each object
5.    if
6.     
7.   end for
8.   compute by the object subset
9.   compute
10.  end for
11.  find the maximal significance and the corresponding attribute
12.  if
13.   
14.  else
15.   exit while
16.  end if
17. end while
18. return
5.3. Comparison of the Effectiveness in Feature Selection

In this section, we select KRS [5], NRS [22], and neighborhood entropy (NE) [28] as the comparison models with KNRS. The feature subsets that are selected by different algorithms are presented in Table 5. The features are presented in the order in which they were added to the feature space.

The KNS model cannot obtain any subsets of the data set “glass.” Most of the feature subsets are slightly different. A small difference in feature quality may lead to a completely different ranking. The orders of the feature subsets reflect the relative significance of the features in terms of the corresponding measures. Therefore, the large differences among these selected features are due to the differences in the qualities of the feature measures.

Then, we build classification models with the selected features and test their classification performance based on a 10-fold cross-validation. The average value and standard deviation are used to measure the classification performance. We compare KNRS, KRS, NRS, and NE in Tables 6 and 7, where the learning algorithms of CART and RBF SVM are used to evaluate the selected features.

The comprehensive correlation results are shown in Table 8. The number of wins for KNRS, KRS, NRS, and NE are 4, 0, 4, and 2, respectively. KNRS achieves the highest average classification accuracy by using fewer features. It is thus concluded that KNRS outperforms the other feature measures. We can interpret the results from two aspects. For the kernel model (KRS), the lower approximation is computed by the “min” operation. Then, the decision on a sample is dependent on the nearest sample. This procedure can lead to a decision error if the nearest sample is a noise point. We obtain a lower classification accuracy when there is considerable data noise in the samples. In contrast, neighborhood-based binary relations can only be expressed in terms of 0 or 1. Neighborhood models (such as NRS and NE) cannot describe the fuzziness delicately like the kernel model. KNRS is a better choice because it integrates the advantages of the kernel and neighborhood models.

6. Conclusion and Future Work

As we know that, genetic algorithms and neural networks are famous mathematical model for pattern recognition, machine learning, and intelligent control. However, rough sets theory has also been successfully applied to these fields [5, 13, 15]. In this study, we define a novel model, the kernel neighborhood rough sets model, which integrates the advantages of the neighborhood and kernel models. Moreover, the model is used in the problem of feature selection. The parameters of KNRS are also discussed in detail. Then, we evaluate the effectiveness of the fast feature selection algorithm. A comparison of the results shows that our model yields a better performance than classic models.

There are two potential directions for future work. First, many other rough sets models can be incorporated into KNRS, such as fuzzy rough sets and Pawlak rough sets. Evaluating the significance of features by using the confluent modes is an important issue. Second, the application of our model to big data is necessary. Consequently, the development of a version of KNRS within a distributed framework requires further attention.

Data Availability

The experimental data sets can be downloaded from the UCI machine learning repository (http://archive.ics.uci.edu/ml/index.php).

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no. 61702128) and the Foundation of the Guizhou Institute of Technology (Grant no. KJZX17-003).