Abstract

The existing fuzzy rough set (FRS) models all believe that the decision attribute divides the sample set into several “clear” decision classes, and this data processing method makes the model sensitive to noise information when conducting feature selection. To solve this problem, this paper proposes a robust fuzzy rough set model (RS-FRS) based on representative samples. Firstly, the fuzzy membership degree of the samples is defined to reflect its fuzziness and uncertainty, and RS-FRS model is constructed to reduce the influence of the noise samples. RS-FRS model does not need to set parameters for the model in advance and can effectively reduce the complexity of the model and human intervention. On this basis, the related properties of RS-FRS model are studied, and the sample pair selection algorithm (SPS) based on RS-FRS is used for feature selection. In this paper, RS-FRS is tested and analysed on the open 12 datasets. The experimental results show that RS-FRS model proposed can effectively select the most relevant features and has certain robustness to the noise information. The proposed model has a good applicability for data processing and can effectively improve the performance of feature selection.

1. Introduction

In the current era of big data, the data scale is massive, and the presentation is high-dimensional. The high dimension of data representation is mainly due to that data often contains a large number of redundant or irrelevant features, resulting in excessively high data dimension, which seriously reduces the processing capacity and time efficiency of pattern classification as well as the resolution ability of decision making. High-dimensional data also makes the fast, timely, and accurate data mining task face great challenges. Therefore, how to effectively select features for these data has become one of the hot topics in the field of machine learning [1, 2]. The purpose of feature selection is to remove a large number of irrelevant and redundant features from the original feature set on the premise of ensuring the learning performance to find a set of feature subsets containing all or most of the classification information of the original feature space to reduce the impact of “dimension disaster” and improve the learning performance. Therefore, feature selection (or attribute reduction) is very necessary, which has become a research hotspot in machine learning. Meanwhile, fuzzy rough set (FRS) theory is not only an objective and effective mathematical tool to deal with incomplete and uncertain information [37] but also a powerful and effective computing paradigm to realize feature selection [810]. In recent years, FRS theory has been widely concerned and applied in data mining, machine learning, pattern recognition, and other fields [1113].

FRS can effectively deal with the fuzziness and vagueness of data. However, based on the upper and lower approximation of the classical FRS model, the nearest sample of the given target sample is used in the calculation, so the classical FRS model constrained by the nearest sample is extremely sensitive to noise.

In order to improve the robustness of classical FRS and reduce the influence of noise samples on the approximation under the model, many robust FRS models were proposed [1420]. An important application of FRS theory is feature selection, also known as attribute reduction, which maintains the inconsistency between feature and decision label. This inconsistency is manifested as two samples having the same feature value but different decision labels. Redundant or irrelevant features can be deleted to improve the classification prediction performance of learning algorithm and save running time and space, so that people can have a clearer understanding of the actual problems based on FRS. Feature selection based on FRS refers to the removal of redundant and irrelevant features from data without changing the ability of data classification. The advantage of FRS method is that in the process of data processing, only the information of the data itself can be utilized without any other prior knowledge and additional information. Generally speaking, there are two types of FRS-based feature selection methods: heuristic method based on dependency [2123] and a structured method based on a discernibility matrix [2428]. The dependency-based heuristic method uses the positive region and the dependency function [2123] as the feature evaluation criterion and uses the heuristic search to obtain the feature subset. Specifically, FRS-based feature selection is calculated by using the forward or backward searching method on the premise of keeping a certain metric constant. Pawlak [1] introduced the concept of positive region when describing the consistency between features and decision label to select features. Hu and Cercone [23] designed an algorithm for calculation of reduction under the condition that the positive region of the decision was kept unchanged. Another kind of method is structured method based on discernible matrix. By introducing the identification matrix, a Boolean identification function is constructed, and the minimal form of the identification function is obtained by logical operation, thus all possible reduction results of the decision table are obtained. Based on the identification information in the identification matrix, many scholars have studied the computational problems of reduction [2428]. For example, Yao and Zhao [24] transformed an identification matrix into its minimal form by designing an absorption matrix algorithm to calculate the reduction, so that the union of all elements constituted a reduction. Because this algorithm needs to calculate all the elements in the identifiable matrix, it will consume a lot of calculation time. In order to save a lot of running time and storage space, Chen et al. [27, 28] determined minimal elements of discernibility matrix through sample pair selection method (SPS) and designed a fast algorithm for calculation reduction based on minimal elements.

In our previous work [29], we proposed a novel FRS model, namely, fuzzy rough set, with representative sample (RS-FRS) model, which is constructed to reduce the influence of noisy samples. In the RS-FRS model, the fuzziness of sample membership is taken into consideration. Using fuzzy equivalent approximate space, other subsets of domain space can be approximated more precisely than those from a conventional FRS model. RS-FRS model does not require preset parameters. Our pilot study indicates that implementing such a new framework could reduce the complexity of the model and human intervention.

However, the previous work needs further research in several aspects. Firstly, the proposed theorem which supports the model is not thoroughly derived, which calls for additional theoretical derivation to strengthen the mathematical background. Moreover, the verification of the proposed model is not comprehensive since the model was tested using only the KNN classifier. Lastly, the previous work lacks solid statistical validations due to its nature as a pilot study that explores the method’s potential effectiveness.

In this manuscript, we extensively addressed these drawback points. We conducted detailed derivations of Properties 2 and 3 to complete the RS-FRS model at the theoretical level. The completion of the theorem also supports the generation of the feature selection algorithm, which is significant to the framework based on RS-FRS models. In addition to the previous KNN classifier, the performance evaluation of the proposed method is extended by validating the algorithm using CART and LSVM classifiers. Our new results show satisfactory accuracy and robustness values, which further support this new method’s effectiveness. Lastly, a comprehensive statistical analysis of the results is conducted and reported in this manuscript by applying Friedman tests and Bonferroni-Dunn tests to the model outputs. In summary, the previous pilot study is strengthened with theoretical, experimental, and statistical proofs to demonstrate the performance and robustness of the proposed RS-FRS method.

2. Preliminaries

The equivalence classes of Pawlak rough set are crisp subsets of the domain. These crisp information granules cannot reflect the fuzziness in reasoning. In practical classification learning, the features describing samples may be fuzzy, or the relations between samples are fuzzy relations calculated by numerical attributes. Therefore, FRS as Pawlak rough set extension model came into being.

For a nonempty universe U, if R is a binary relation and it satisfies reflexivity, symmetry, and sup-min transitivity, R is a fuzzy equivalence relation. The fuzzy equivalence class is generated by R with respect to sample . is a fuzzy set on U, which is also referred as the fuzzy neighborhood of x, i.e., , for all .

Definition 1. Given a nonempty finite domain , R is the fuzzy equivalence relation of U. For , the fuzzy equivalence class of x is , and it is the fuzzy subset of U. The membership degree value of to is . The set of fuzzy equivalence classes forms a basic conceptual system for approximating any subset in the theoretic domain space, which is called fuzzy equivalence approximate space . The upper and lower approximations of in are defined asIn equation (1), the upper approximation of the sample is determined by and . The lower approximation of the sample is determined by and , where is the membership degree of the sample with respect to the equivalence class .
At present, the existing fuzzy rough set models [1420] consider that the decision attribute D divides the sample set into several crisp decision classes, so the membership degree in the upper and lower approximations of the FRS model is a binary function with the value of 0 or 1. Then, the upper and lower approximations of the classical FRS model are reduced to

3. The Proposed Model

At present, the existing FRS models [1420] consider that the decision attribute D divides the sample set into several crisp decision classes, so as the membership degree in equation (1) is a binary function with the value of 0 or 1. Then, the upper and lower approximations of classical FRS model are degenerated as equation (2). However, this strategy that it divides the sample set into crisp decision classes is extremely sensitive to the noise samples, and if there exist noise samples, has fuzzy uncertainty. Only defining as a binary function of 0 or 1 does not meet the requirements of practical application, and it cannot well reflect the real membership relations between samples and each equivalence class. Therefore, determining the membership of samples is another important challenge for fuzzy rough set models.

3.1. Representative Sample

Unlike the existing FRS models, we consider D divides into several fuzzy decision classes and define “representative sample” to calculate the fuzzy membership of samples. To be specific, the corresponding representative samples are found for each label. Then, the fuzzy membership degree of the target sample with respect to each label is calculated according to the distance between the target sample and the representative sample, so as to design a robust FRS model.

Definition 2. (see [29]). Let be a fuzzy decision system, where the sample set has m attributes . The decision attribute D divides the sample set U into r crisp equivalent decision classes . The representative sample of the class is defined aswhere is the distance between two samples in the class . In this paper, Euclidean distance is used as a basic implementation of .

Definition 3. (see [29]). Let be a fuzzy decision system, where the sample set has m attributes . The decision attribute D divides the sample set U into r crisp equivalent decision classes . The representative sample of the class is . The membership degree of the sample with respect to class is defined aswhere is the distance between sample x and representative sample .
According to Definition 3, we can determine the membership degree of sample x with respect to each equivalence class by calculating the distance between sample x and the representative samples of each equivalence class. The membership degree meets and . It can be seen that can fully reflect the fuzziness of sample membership. The larger the value of , the higher the degree of sample x belongs to class . The smaller the value of , the lower the degree of sample x belongs to class . In the dataset, the samples located in the boundary region may be the noise samples that have been mislabeled. By the calculation of fuzzy membership, the degree of the boundary samples belonging to each class can be determined.

3.2. FRS Model with Representative Sample

Based on Definitions 2 and 3, we propose a FRS model with representative samples (RS-FRS).

Definition 4. (see [29]). Let be a fuzzy decision system, where the sample set has m attributes . The decision attribute D divides the sample set U into r crisp equivalent decision classes . R is a fuzzy equivalence relation on U. The representative sample of the class is . The upper and lower approximations of RS-FRS model are defined aswhere .
The lower approximation of the FRS model indicates the certainty that a sample belongs to its decision class in the fuzzy approximation space, and the upper approximation indicates the possibility that a sample belongs to its decision class in the fuzzy approximation space. Therefore, the lower approximation can be used for classification and feature selection.
The samples of dataset located in the classification boundary region are most likely to be noise samples. Because these samples are close to the samples of other classes, these noise samples are often used to calculate the lower approximation of the classical FRS model. This process causes the lower approximation to get smaller. In the proposed RS-FRS model, we consider not only the distance between the target sample and the nearest different classes’ sample but also the fuzzy membership of the nearest different classes’ sample. The fuzzy membership degree is calculated to expand the lower approximation of the model and reduce the influence of noise sample on the lower approximation, thus RS-FRS is robust.
The main difference between the RS-FRS model and the classical FRS is that the fuzziness of sample membership is ignored when the upper and lower approximations are calculated based on the classical FRS. This can easily lead to errors in the upper and lower approximations of classical FRS model when dealing with datasets containing noise samples. Therefore, the classical FRS can only maintain the maximum fuzzy dependence, but it cannot process noise information well. In contrast, the RS-FRS model considers the fuzziness of sample membership degree and can more precisely approximate other subsets of the domain space with fuzzy equivalent approximation space, so that the data fitting effect is better. Compared with the existing robust FRS model, the RS-FRS model does not need to set parameters for the model in advance, which can effectively reduce the model complexity and human intervention.

3.3. Related Properties of FRS Model with Representative Sample

For standard max operator , standard min operator , and standard complement operator , some properties of RS-FRS model are discussed. If other fuzzy operators are used [13], the relevant conclusions can be similarly generalized.

Property 1. (see[29]). For , the following statements hold:

Proof.

Property 2. For , if x is a normal sample, the following statements hold:

Proof. Because x is a normal sample, x can be used directly to compute its lower approximation ,Therefore, .Because x is a normal sample, x can be used directly to compute its upper approximation ,Therefore, .

Property 3. For , and are the upper and lower approximations of the classical FRS, respectively. The following statements hold:

Proof. , according to equation (2) in Section 2, when the standard operator is given, thenTherefore, .Therefore, .
Property 1 proves the duality between the upper and lower approximations. Property 2 demonstrates the inclusion relationship between the upper and lower approximations and the original sample set. Property 3 proves the relationship of the upper and lower approximations between the RS-FRS model and the classical FRS model. The presentation and derivation of these properties provide a theoretical basis for the establishment and design of subsequent algorithms.

4. Feature Selection Based on FRS Model with Representative Sample

Although the method based on discernibility matrix [27] can be used for feature selection, it has two obvious disadvantages: (1) a waste of computing time cost. All feature subsets can be obtained by using discernibility matrix method. However, when the result of feature selection is used for pattern recognition or classification, only one feature subset is sufficient. In other words, feature selection based on discernibility matrix method will increase invalid time cost. (2) A waste of storage space. When using the discernibility matrix, it is necessary to store all the elements in this matrix and reduce them through absorption operator. However, the key to find one feature subset is to determine the minimum elements. Compared with storing the minimum elements, the discernibility matrix method will cause a waste of storage space. Therefore, in this paper, we use the improved discernibility matrix method, named sample pair selection algorithm (SPS) [28]. Since each of these minimum elements is determined by at least one sample pair, SPS algorithm can select the pairs of samples corresponding to the minimum elements in the sample set. Based on this, we propose a feature selection algorithm based on the RS-FRS model with SPS to save time and space costs.

Let be a fuzzy decision system, where the sample set has m attributes . Each conditional attribute induces a fuzzy relation . Attribute subset induces the fuzzy relation . The discernibility matrix of this decision system is a matrix, denoted by , and the element of discernibility matrix is :where is the lower approximation of with respect to .

In the RS-FRS model, the relative discrimination relation of conditional attribute with respect to decision attribute D is defined as a binary relation, which is calculated by the following equation:where and .

We obtain the following results by finding the relationship between and element in the discernible matrix . If , . , where is the number of conditional attributes that satisfied . Obviously, .

According to the above definitions and analysis, a feature selection algorithm based on RS-FRS model with SPS can be described by Algorithm 1.

Input: a set of condition attributes ; a set of samples .
Output: selected feature subset
(1), according to equation (5), the lower approximation of RS-FRS model is computed;
(2)Compute every and ;
(3)Sort according to ;
(4)whiledo
(5) Selected the first sample pair ;
(6) Select one such that and add into ;
(7);
(8)end while
(9)return

5. Experiments

In this section, we compare the proposed RS-FRS model with several existing models from the three aspects of classification accuracy, robustness, and statistical properties. The existing models include FRS [30], -PFRS () [14], K-trimmed FRS (k = 5) [18], K-means FRS (k = 5) [18], K-median FRS (k = 5) [18], and SFRS (r = 0.1) [15]. To test the performance of all FRS models in feature selection, these experiments are conducted on three common classifiers, including k-nearest neighbor (KNN, K = 3) [31], classification regression tree (CART) [32], and linear support vector machine (LSVM) [33].

5.1. Datasets

In our experiments, 12 datasets from the open access UCI database [34] are utilized and described in Table 1. UCI dataset is a commonly used dataset of machine learning standard tests. For example, in dataset “wine”, these data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. The attributes are alcohol, malic acid, ash, alkalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of diluted wines, and proline.

5.2. Results
5.2.1. Comparison of Classification Accuracy

On three classifiers, we conduct comparison of classification accuracy with different FRS models in original data and noise data. The corresponding noise levels are 0%, 5%, and 10%, respectively. 0% noise level means the original data (the original data itself assumes no noise).

5% (or 10%) noise level means there are 5% (or 10%) samples mislabeled randomly in the original data. In order to ensure the reliability of the experiments, we carry out the 10-fold cross-validation based on each independent noise random process. Tables 24 show the comparison of classification performance of different FRS models on KNN, CART, and LSVM in original and noisy data. Bold values indicate the optimal performance among all FRS models, and italic values indicate the average classification accuracy of each model under all datasets.

In terms of the global perspective on the three classifiers, the average precision of the RS-FRS model is higher than that of the other FRS models upon the original data (0%) and the noise data (5% and 10%). Furthermore, in terms of the perspective of each dataset, each classifier, and each noise level, with KNN, RS-FRS model is optimal on 8 out of 12 datasets upon the original data (0%), and it is optimal on 15 out of 24 datasets upon the noise data (5% and 10%) in Table 2. Similar results are shown in Tables 3 and 4. With CART, RS-FRS model is superior on 7 out of 12 datasets upon the original data (0%), and it also is superior on 16 out of 24 datasets upon the noise data (5% and 10%) in Table 3. With LSVM, RS-FRS model is optimal on 6 out of 12 datasets upon the original data (0%), and it is optimal on 16 out of 24 datasets upon the noise data (5% and 10%) in Table 4. It is worth noting that for wine, soy, hepatitis, ICU and WPBC, these 7 FRS models have the same or similar performance in 0% noise level. This is because there is no noise in the original data according to our obvious assumption; the 7 FRS models find the same sample as the nearest sample of the target sample to calculate the upper and lower approximations of the target sample.

5.2.2. Robustness Analysis

In addition, after adding the noise gradually, the decreasing degree of the average precision of different models (Tables 24) is used as the basis for comparing the robustness of these models. In Figures 13, FRS,-PFRS, K-trimmed FRS, K-means FRS, K-median FRS, SFRS, and RS-FRS model are numbered from 1 to 7 in sequence.

As shown in Figures 13, as the noise level increases gradually, the performance of every FRS model declines on three classifiers. This kind of phenomenon is consistent with the actual situation, because the noisy sample may be regarded as the nearest sample of the given target sample when these models calculate the lower approximation of the target sample. Therefore, the classification accuracy of these FRS models will reduce. However, among these models, the classical FRS model declines more sharply and changes more dramatically. This is because the classical FRS model is the most dependent on the nearest sample and it has the least robust performance. The proposed RS-FRS model presents the smallest decline and the least drastic change, indicating that our strategy that divides the sample set into fuzzy decision classes is in line with reality and has the best robust performance.

5.2.3. Statistical Analysis

To further explore whether there are significant differences in the average classification performance of the seven FRS models, we adopted Friedman test [35] and Bonferroni-Dunn test [36]. The Friedman statistical coefficient is defined aswhere is the average ranking of model i in all datasets and k is the total number of models and N is the number of datasets. obeys the Fisher distribution of and degrees of freedom. In this paper, and . At significance level , . Table 5 shows the Friedman statistical coefficients and corresponding critical values under different classifiers. If is greater than the critical value, the null hypothesis is rejected, and we believe that there is a significant difference among the performance of all models. On the contrary, the null hypothesis cannot be rejected and it can be considered that there is no significant difference among the performances of all models.

If Friedman’s null hypothesis is rejected, the Bonferroni-Dunn statistical verification method can further analyze the relative performance between each comparison model and RS-FRS model, and RS-FRS model is regarded as the control model. If there is a significant difference between the two models, the difference between the average ranking between them should be at least greater than the critical difference (CD):where k is the total number of models, N is the number of datasets, and is the critical value of Bonferroni-Dunn test at the corresponding significance level. At significance level , . Therefore, . If the average ranking difference between the RS-FRS model and a comparison model is greater than the value of CD, we consider their performance to be significantly different.

Figures 46 show the CD diagram of the three classifiers, in which the average ranking of all algorithms is drawn in turn along the horizontal axis. In other words, scales from 1 to 7 on the horizontal axis are the average ranking of the seven algorithms over 12 datasets, with the higher the ranking, the better the algorithm, and the algorithm with the highest ranking (optimal) in the axis is located on the right side of the axis. If the average ranking between RS-FRS and a comparison algorithm is connected by a thick line (CD value), it means that RS-FRS has comparable performance with the comparison algorithm. If the average ranking difference between the RS-FRS model and a comparison model is greater than CD value, we consider their performance to be significantly different. As shown in the above figures, we can obtain the following results: (1) under the KNN, the RS-FRS is obviously better than the model of FRS, β-PFRS, K-trimmed FRS, K-means FRS, K-median FRS, and SFRS model Moreover, no other comparison model has the same performance as RS-FRS model in statistical verification. (2) Under the CART, the RS-FRS is obviously better than the model of FRS, β-PFRS, K-trimmed FRS, K-means FRS, K-median FRS, and SFRS model. No other comparison model has the same performance as RS-FRS model in statistical verification. (3) Under the LSVM, the RS-FRS and SFRS models are solid strong to the FRS, β-PFRS, K-trimmed FRS, K-means FRS, and K-median FRS model. Moreover, RS-FRS and SFRS models have considerable performance on statistics. Through the analysis of the above, the RS-FRS model performance is better than that of FRS, β-PFRS, K-trimmed FRS, K-means FRS, K-median FRS, and SFRS model.

6. Conclusions

The development of robust FRS is a hot spot in the theory of FRS, which has some advantages in the feature selection of noise information. In this paper, the nonparametric fuzzy membership degree is defined by fuzzy granular calculation, and a FRS model based on representative samples is proposed. Firstly, the distance between the target sample and each representative sample is calculated by looking for the representative sample of each class, then the fuzzy membership degree of the target sample about all classes is determined, a FRS model based on the representative sample is proposed, and the relevant properties of the model are studied. On this basis, a feature selection algorithm based on FRS of representative samples is designed. Experimental results show that the RS-FRS model and feature selection algorithm proposed in this paper are feasible, effective, and robust for the processing of uncertain information systems with noise information, which expands the application field of FRS and the research on feature selection. However, the paper has some limitations. In this paper, the feature selection is carried out by the lower approximation of RS-FRS model. The lower approximation of the sample reflects the degree of certainty to which the sample belongs to the class, while the upper approximation of the sample reflects the degree of probability to which the sample belongs to the class. In the future research, it is considered to design feature selection algorithms using both upper approximation and lower approximation.

Data Availability

In this paper, the authors use UCI database which is public and open access to obtain in the website http://www.ics.uci.edu/mlearn/MLRepository.html.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Jiangsu Province of China (BK2C0200364), the National Natural Science Foundation of China (62001111), the National Key Research and Development Program of China (2019YFE0113800), and the National Natural Science Foundations of China (82072014 and 81901842).