A Novel Approach for Reducing Attributes and Its Application to Small Enterprise Financing Ability Evaluation

Shi, Baofeng; Meng, Bin; Yang, Hufeng; Wang, Jing; Shi, Wenli

doi:https://doi.org/10.1155/2018/1032643

Complexity

On this page

Abstract Introduction Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 1032643 | https://doi.org/10.1155/2018/1032643

A Novel Approach for Reducing Attributes and Its Application to Small Enterprise Financing Ability Evaluation

Baofeng Shi,¹Bin Meng,²Hufeng Yang,¹Jing Wang,¹and Wenli Shi³

Academic Editor: Dimitri Volchenkov

Received28 Jun 2017

Accepted05 Dec 2017

Published15 Jan 2018

Abstract

Attribute reduction is viewed as a kind of preprocessing steps for reducing large dimensionality in data mining of all complex systems. A great deal of researchers have proposed various approaches to reduce attributes or select key features in multicriteria decision making evaluation. In practice, the existing approaches for attribute reduction focused on improving the classification accuracy or saving the cost of computational time, without considering the influence of the reduction results on the original data set. To help address this gap, we develop an advanced novel attribute reduction approach combining Pearson correlation analysis with test significance discrimination for the screening and identification of key characteristics related to the original data set. The proposed model has been verified using the financing ability evaluation data of 713 small enterprises of a city commercial bank in China. And the experimental results show that the proposed reduction model is efficient and effective. Moreover, our experimental findings help to locate the qualified partners and alleviate the difficulties faced by enterprises when applying loan.

1. Introduction

With the coming of the era of big data, the size of data sets has been increasing sharply, causing the decision makers and management to have difficulty in making decisions based on those data [1]. Then the most important thing for decision makers is to reduce huge attributes or large dimensionality in data sets. Attribute reduction, also called indicators selection or feature screening, ascertains a subset of attributes to reduce the dimensionality of the original data sets. Utilizing reducing attributes, it can select the attributes with the highest information content and save the cost of computational time and memory [2]. Besides, it is also useful to improve the classification accuracy as a result of deleting the information chaos and irrelevant attributes [3]. In practice, attribute reduction has been applied to a great deal of fields such as decision making, pattern recognition, and economic and social system evaluation [4–7].

The main attribute reduction approaches can be divided into three categories. One of the most famous methods for attribute reduction is based on rough set theory. Rough set approach proposed by Pawlak provides useful tools for reasoning from data [8]. It is advantageous to other approaches for attribute reduction that typically use multivariate statistics which require specific parametric assumptions [9, 10]. Degang et al. established a model to reduce the attributes of covering decision systems combining traditional rough set. Empirical study indicated that the proposed attribute reduction approach accomplished better classification performance than those of existing rough set methods [11]. In order to improve the classification accuracy containing hybrid type attributes, such as discretizing numerical attributes or categorical attributes, Hu et al. introduced a simple and efficient greedy algorithm for hybrid attribute reduction [12]. When some decision or evaluation systems have some errors, missing data, and missing attributes in observation, neither DRSA (dominance-based rough set approach) [13] nor VC-DRSA (variable-consistency dominance-based rough set approach) [14] can work appropriately. Inuiguchi et al. created a variable-precision dominance-based rough set approach (VP-DRSA) to deal with these problems [15]. Tsang et al. presented an attribute reduction model with covering rough sets based on discernibility matrix to compute all attribute reducts [16]. Furthermore, Wang et al. developed a novel approach for constructing simpler discernibility matrix with covering rough sets, and it improved some characterizations of attribute reduction proposed by Wang et al. [17]. In addition, there are the two most important attribute reduction models, which extend the Pawlak’s rough set, the neighborhood rough set (NRS) model [18] and the fuzzy rough set model [19]. They can tackle continuous numeric data and fuzzy information granulation, and the determination of what objects should be included in a rough set allowed some flexibility [20].

The second method for screening key factors is the attribute reduction models based on statistics or econometrics technique. In order to obtain preference information of the decision maker in multiobjective search, Zitzler and Künzli defined an optimization goal in terms of a binary performance measure, to select key information directly utilizing this measure [21]. Polat and Krmac screened the most important attributes using pairwise Fisher score attribute reduction approach (PFSAR) and correlation based attribute reduction [22]. Ju and Sohn developed a technology attribute reduction model that uses logistic regression based on exploratory factor analysis (EFA) of 16 technology-related attributes [23]. Elliott et al. developed a model based on a double hidden Markov model (DHMM), to extract information about the “true” credit qualities of firms [24]. Shi et al. created an indicators extraction model based on Pearson correlation analysis and logistic regression significant discriminant in customers’ classification. The proposed approach ensured the reserved indicators can effectively distinguish default customers from nondefault customers [25].

In addition, there are other attributes reduction methods, such as the concept lattice model, the heuristic algorithm, and the colony optimization algorithm. Some researchers developed some new attribute reduction models by using the concept lattice classification theory [26–28]. Wei et al. discussed attribute reduction in information systems by establishing three equivalence relations on the attribute set and its power set [29]. In overwhelming data analysis and machine learning studies, most existing attribute reduction work focused on improving the classification accuracy. However, these studies neglected the problem of how to decrease the test cost. Min et al. proposed a heuristic algorithm to handle this problem in attribute reduction [30]. Chi et al. created an indicators screen model based on correlation analysis and component analysis [31]. Minimal test cost attribute reduction is very important in cost-sensitive machine learning. However, in many cases these heuristic algorithms cannot find the optimal solution. In order to deal with this problem, Xu et al. established an ant colony optimization algorithm for attribute reduction. Experimental results on UCI data sets showed that the proposed method outperforms the information gain-based approach [32]. According to the principle of eliminating redundant information and the principle of the maximum information content, Shi and Chi proposed an attribute reduction model combining cluster analysis and coefficient of variation [33]. Because people are interested in the maximal rules implicated in attribute reduction, Li et al. developed two new kinds of attribute reduction approaches in the decision formal context based on maximal rules [34].

The existing findings can offer important references for reducing attributes. However, there are still some limitations. First of all, in the evaluation of complex systems, the aim of the attribute reduction is to eliminate the factors, which should not have significant effect on the comprehensive evaluation results. However, the existing attribute reduction approaches have not established the comprehensive index (i.e., the comprehensive score vector ), which can reflect all of the attributes’ characteristics. This means that the existing attribute reduction approaches have not developed the relationship between attributes and the comprehensive index (i.e., the comprehensive evaluation result). This results in some reserved attributes, which do not have significant effect on the comprehensive evaluation result. And secondly, most of existing attributes reduction approaches judged the performance of the proposed approach by the standard of saving the cost of computational time. The standard does not analyze the information contribution degree of the reserved attributes to the mass-election attributes. Thirdly, most of existing researches verify the applicability of the proposed attribute reduction methods using numerical simulation, but not utilizing actual data.

To solve the shortcomings, this study creates a novel attribute reduction model to screen the key influencing factors. We advance in three aspects. First, this paper establishes an attribute reduction approach by combining Pearson correlation analysis with test significance discrimination. Pearson correlation analysis is applied to the calculation of the correlation among attributes to delete the similar attributes. test significance discrimination is used to select the key attributes which have the greatest influence on comprehensive index . Second, we also define an information contribution ratio to assess this attribute reduction approach from a statistical viewpoint. Third, the proposed attribute reduction approach has been verified by utilizing the financing ability evaluation data of 713 small enterprises of a city commercial bank in China. Empirical evidence presents that the selected attributes reflect 94.7% original information with 27.54% original attributes. Furthermore, this paper also selects 19 key influencing factors for assessing the financing ability of small enterprises.

The remainder of this paper is organized as follows. Section 2 introduces the design and methodology of this study. Section 3 presents the data and empirical analysis of our attribute reduction model for 713 small enterprises. Section 4 concludes and highlights the future research directions of this paper.

2. Design and Methodology of the Study

In this section, we introduce a novel attribute reduction model by combining Pearson correlation analysis with test significance discrimination approach. First of all, in order to eliminate the influence of the differences of attributes units and dimensions on attribute reduction, the original data should be transformed into real numbers within the interval . Secondly, we utilize Pearson correlation analysis to delete the attributes of large correlation from the whole mass-election attributes set, avoiding repeated information. Thirdly, test significance discrimination approach has been created to select the attributes with the highest information content, which ensures that the selected attribute has the greatest influence on the small enterprise financing performance. A step-by-step instruction is as follows.

2.1. Standardization of Attribute Data

In our attribute reduction model, the first step is standardization of attribute data so that the after-calculation processes and parameters use the same standard. According to the features of attributes, the attributes can be divided into two types: quantitative attributes and qualitative attributes. The quantitative attributes include positive attributes, negative attributes, and interval attributes. The positive attributes are attributes showing that the greater their values are, the better the small enterprise financing capacity is. The negative attributes are attributes showing that the less their values are, the better small enterprise financing capacity is. The interval attributes are attributes reasonable only when the original index data are within certain range.

The standardization equations of positive attributes, negative attributes, and interval attributes are represented by (1), (2), and (3), respectively, [35]: where is the standardized score of the small enterprise on the attribute, is the attribute original data of the small enterprise on the attribute, is the number of small enterprises, is the left boundary of the ideal interval, and is the right boundary of the ideal range.

The qualitative attributes refer to these attributes whose attribute values are described by a text, rather than a numerical value. The standard scores of qualitative attributes can be obtained by rational analysis and expert investigation.

2.2. Pearson Correlation Coefficients

The Pearson product-momentum correlation coefficient was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s [36]. It is a measure of the linear correlation (dependence) between two random variables. It was also called the PPMCC, PCC, or Pearson’s . Historically, it is the first formal measure of correlation and it is still one of the most widely used measures of relationship.

The Pearson correlation coefficient of two attributes and is defined as the covariance of the two variables divided by the product of their standard deviations. The Pearson correlation coefficient is commonly represented by the letter r and it can be equivalently defined by [37] where , are the mean of and , respectively. Equation (4) is applied to the calculation of the correlation between two variables and . The coefficient ranges from −1 to 1 and it is invariant to linear transformations of either variables. A value of 1 indicates a total positive correlation between and , a value of 0 implies no correlation between and , and a value of −1 indicates a total negative correlation.

Some authors have offered guidelines for the interpretation of the Pearson correlation coefficient [38–41]. If the Pearson correlation coefficient of two attributes is greater than 0.8 [40, 41], we can conclude that these attributes are information redundancy. In this situation, we should remove one of attributes. In the opposite situation, if the Pearson correlation coefficient is smaller than 0.8, it indicates that these attributes are not information redundancy and should keep these two attributes.

2.3. Attribute Reduction Model

In our attribute reduction model, the third step is to select the key attribute which has the greatest influence on comprehensive index and deleting the uncorrelated attributes. In this part, we first calculate the attribute weightings using entropy weight approach. And then, we can obtain the financing ability evaluation score (i.e., comprehensive index ) for every small enterprise. Subsequently, the multiple determination coefficient between comprehensive index and all of these attributes can be obtained, and the multiple determination coefficient between comprehensive index and the remaining attributes after removing an attribute can be calculated. By using test significance discrimination, these key attributes which have the greatest influence on small enterprise financing ability evaluation are selected. At the same time, the reduction idea—that is, the bigger the difference between the multiple repeated determination coefficient and the multiple determination coefficient (), the more the significance to comprehensive evaluation results—is reflected. Thus, the right time to make up the existing attribute reduction approaches cannot reflect the influence of attributes on the comprehensive index , because the attribute reduction process has nothing to do with comprehensive index .

2.3.1. Weighting Attributes Utilizing Entropy Weight Method

Let denote the weight of the attribute in the small enterprise, let denote the standard score of the attribute in the small enterprise, let denote the number of small enterprises, and let denote the number of attributes.

The subordinate degree function of the attribute is given by

Then, the entropy of the attribute can be calculated with

And then, the entropy weight of the attribute is [42] where .

2.3.2. Reducing Attributes Based on Test Significance Discrimination

After eliminating redundant information in Section 2.2, this section will select the key attributes which have the greatest influence on comprehensive index utilizing test significance discrimination approach. Now we outline the steps to build an attribute reduction model based on test significance discrimination.

Step 1. Calculate the comprehensive index . Let denote the comprehensive index or the comprehensive score for the small enterprise financing ability evaluation. We haveThe meanings of the rest of variables in (8) are the same as the variables in (1) and (7).

Step 2. Calculate Pearson correlation coefficients between attribute and the comprehensive index . We can assume the attributes ranking is according to the correlation coefficient absolute value in a descending order.

Step 3. Calculate the multiple determination coefficient between comprehensive index y and the remaining attributes , , …, after removing the first attribute with the biggest correlation coefficient absolute value.
Let denote the estimated parameters, respectively, let denote attributes, and let denote the random error term. The regression function is given by In (9), the estimated values for parameters can be obtained using the least squares regression estimation method. Furthermore, the estimated value vector of the comprehensive index y can be calculated. Then, we have [43] where and denotes the number of small enterprises.
It should be pointed out that the attribute should be reserved in attribute reduction, because the attribute has the maximum pertinency with the comprehensive evaluation results. It also indicates that the attribute has the biggest impact on small enterprise financing ability evaluation.

Step 4. Calculate the multiple determination coefficient between comprehensive index and the remaining attributes after removing the first attribute with the biggest correlation coefficient absolute value and the second attribute with the second biggest correlation coefficient absolute value.
Let denote the estimated parameters, respectively, let denote attributes, and let denote the random error term. The regression function is as follows: In the same way, we can calculate the estimated value vector of the comprehensive index for (11). And the multiple determination coefficient is given by

Step 5. Calculate . Let denote the difference of the multiple determination coefficient and the multiple determination coefficient ; namely,In (13), the difference reflects the influence of the attribute on the comprehensive index . If is not equal to zero significantly, it means that the attribute affects the comprehensive evaluation result significantly, and therefore the attribute should be reserved. On the contrary, if is equal to zero significantly, then , which indicates the attribute does not have significant effect on the comprehensive evaluation result , and the attribute should be deleted.

Step 6. Reduce attributes establishing test significance discrimination.
Hypothesis : ; : .
Let denote the test value of the attribute ; we have [44]For (14), we can understand its meanings from the following three aspects. Firstly, the bigger the multiple determination coefficient is, the smaller the deviation of the estimated value and the actual comprehensive index would be. The smaller the multiple determination coefficient is, the bigger the deviation of the estimated value and the actual comprehensive index after removing the attribute would be. That is to say, when we remove the attribute , the explanation ability of the attributes to the comprehensive evaluation score decreases significantly. It also indicates that the attribute has significant effect on the comprehensive evaluation result of small enterprises; thus the attribute should be reserved.
Secondly, the bigger the difference of the multiple determination coefficient and the multiple determination coefficient is, the bigger the difference of the explanation ability of the attributes to the comprehensive evaluation score and the explanation ability of the attributes to the comprehensive evaluation score would be. It means that the attribute affects the comprehensive evaluation result of small enterprises significantly, and the attribute should not be deleted.
Thirdly, the bigger the difference (i.e., the bigger the difference value ) is, the bigger the test value would be. In this situation, the test can be passed easily. And it also expresses the attribute effects on the comprehensive evaluation result significantly.
Under the condition of the hypothesis of , follows distribution; that is to say, . Let the confidence level be equal to 0.05 [45]; the critical value can be checked from statistics. If , accept hypothesis . It means that is not equal to zero significantly, and the attribute should be reserved. Conversely, if , reject hypothesis , which indicates that is equal to zero significantly, and the attribute should be deleted.

Step 7. Repeat Step 3 to Step 6, and select other attributes.

For the rest of the attributes , we can reduce attributes by repeating Step 3 to Step 6. Until you find the first attribute , the corresponding test value satisfies the inequation . At this time, the attribute reduction can be stopped. It suggests that the rest of attributes do not have significant influence on comprehensive evaluation result .

2.4. The Judgment of Reasonability of the Proposed Attribute Reduction Approach

According to the idea that the multiple determination coefficient can be used to describe the explanation ability of the independent variable on the dependent variable, this paper uses an information contribution ratio to assess the performance of attribute reduction model. The information contribution ratio can be defined as the ratio of the explanation ability of the reserved attributes to the comprehensive evaluation score to the explanation ability of the mass-election attributes to the comprehensive evaluation score .

Let denote an information contribution ratio of the reserved attributes to the mass-election attributes, let denote the multiple determination coefficient of the reserved attributes to the comprehensive evaluation score , and let denote the multiple determination coefficient of the mass-election attributes to the comprehensive evaluation score . The information contribution rate of the reserved attributes to the mass-election attributes is given by

Equation (15) is applied to judge the reasonability of the proposed attribute reduction model. The numerator reflects the explanation ability of the reserved attributes to the comprehensive evaluation score , and the denominator illustrates the explanation ability of the mass-election attributes to the comprehensive evaluation score . Equation (15) is the ratio of the explanation ability to the explanation ability . It reveals the information contribution degree of the reserved attributes to the mass-election attributes.

As a decision criterion for judging the rationality of the proposed attribute reduction model, the proposed approach is considered reasonable if the reserved attributes are able to contribute more than 90% of the mass-election attributes by using less than 30% of attributes in the mass-election attribute set.

3. Empirical Study

3.1. Sample Selection and Data Sources

In consideration of research purpose of verifying the applicability of the proposed attribute reduction model, this subsection implements empirical study based on the financing ability data of 713 small enterprises. In order to guarantee the representation of empirical results, this paper collected the data from the headquarter and all of the branches in a city commercial bank of China, including Beijing Branch, Tianjin Branch, Shanghai Branch, Chongqing Branch, Shenyang Branch, Dalian Branch, and Dandong Branch. The data is shown in Column 5 to Column 717 in Table 1 [46].

The mass-election attribute set for small enterprise financing ability evaluation contains six criterion layers: , enterprise basic situation; , debt paying ability; , enterprise profitability; , operation ability; , development potential; , enterprise external macroconditions, as shown in Column 2 in Table 1. All of the 69 attributes are listed in Column 3 in Table 1. As known from the fourth Column of Table 1, there are 46 positive attributes, 7 negative attributes, 2 interval attributes, and 14 qualitative attributes.

3.2. The Attribute Data Standardization

In this paper, we have two interval attributes: “ the age of enterprise legal person” and “ consumer price index (CPI).” The ideal range of “ the age of enterprise legal person” is [25]. It means if the age of the business owner is within the interval , the repayment ability and repayment willingness of the small enterprise are strong. The ideal range of “ consumer price index (CPI)” is [25]. It indicates that there exists neither deflation nor inflation, when the CPI is within the range .

The data standardization for quantitative attributes is as follows: in terms of the attribute type in Column 4 of Table 1, substituting the original data of positive attributes from Column 5 to 717 of Table 1 into (1), the original data of negative attributes into (2), and the original data of interval attributes into (3), the standardized data of attributes are obtained. The results are shown in Column 718 to 1430 of Table 1.

Subsequently, we will compute the standardized score for the qualitative attributes. Learning from a commercial bank nonfinancial attributes scoring standard [46], the scoring standard of qualitative attributes can be obtained by rational analysis, as shown in Table 2. Then, the standardized scores of qualitative attributes are obtained combined with the attribute type in Column 4 of Table 1, as shown in Column 718 to 1430 of Table 1.

3.3. Attribute Reduction Utilizing Pearson Correlation Analysis

In practice, due to the presence of related attribute values but independent attributes of meaning, some of the attributes might be mistakenly deleted. This paper calculates attributes’ Pearson correlation coefficients in the same criterion layer. In order to explain the process of Pearson correlation analysis, we take the 10 attributes of the fourth criterion layer “ Operation ability” as an example.

After substituting the data from Row 45 to 54 and Column 718 to 1430 of Table 1 into (4), the correlation coefficients can be obtained for any two attributes, as shown in Table 3. Known from Table 3, the correlation coefficient 0.998 between “ accounts payable turnover speed” and “ cash cycle” is greater than the threshold value 0.8, which means that the two attributes reflect information highly repetitively. Because there are other attributes representing cash flow in the attribute set, such as “ the main business income cash ratio” and “ all assets cash recovery rate,” we delete the attribute “ cash cycle.”

Similarly, we can obtain the attributes’ Pearson correlation coefficients for the rest of five criterion layers. Next, we delete the other 15 attributes: , , , , , , , , , , , , , , and . The removed 16 attributes are marked with “delete by correlation analysis” in the last column of Table 1.

There are 53 attributes after reducing by Pearson correlation analysis, and the corresponding attributes’ standard data are listed in Column 3 to 55 in Table 4.

3.4. Attribute Reduction Using Test Significance Discrimination

Taking the data of Table 4 into (5) to (7), the entropy weights of 53 attributes can be obtained:

Substituting the data of the first row in Table 4 and the entropy weights of 53 attributes into (8), then the comprehensive score of enterprise 1 can be calculated. Similarly, we can calculate the rest of 712 enterprises’ comprehensive scores , as shown in the last column of Table 4.

After taking the data from Table 4 into (4), 53 Pearson correlation coefficients between the 53 attributes and the comprehensive score can be obtained. According to the correlation coefficient absolute value in a descending order, the attributes’ ranking results are listed in Table 5. Obviously, the attribute has the maximum correlation coefficient with the comprehensive score ; therefore the attribute should be reserved.

In terms of Step 3 in Section 2.3.2, substituting the 52 attributes’ data after removing the attribute and the comprehensive score into (9) and (10), then the multiple determination coefficient between the comprehensive index and the remaining 52 attributes can be got. In a similar way, we can calculate the multiple determination coefficient between the comprehensive index and the remaining 51 attributes after removing the two attributes and . Then . Take , , and into (14); can be obtained.

Let the confidence level be equal to 0.05; the critical value can be checked from statistics. Because , we should accept the hypothesis . It means that is not equal to zero significantly, and the attribute affects small enterprise financing ability evaluation significantly, and therefore it should be reserved.

Similarly, after repeating this process as mentioned above, we can select other 17 attributes, as shown in Row 3 to Row 19 of Table 5. Until the 20th attribute , the test value equals 2.3591. And the corresponding equals 3.8552. Obviously, ; thus we should accept the hypothesis . It indicates that the attribute does not have significant influence on the comprehensive evaluation result y, and it should be deleted. At this time, the attribute reduction can be stopped according to Step 7 in Section 2.3.2.

The selected 19 attributes are marked with “reserve” in the last column of Table 5. And the deleted 34 attributes are marked with “delete by test significance discrimination” in the last column of Table 5. The selected 19 key influencing factors of small enterprise financing ability are shown in Table 6. And the detailed attribute reduction process of financing ability evaluation for 713 small enterprises is shown in Table 7.

3.5. The Reasonability Judgment for the Proposed Attribute Reduction Model

In Table 4, taking the data of 19 reserved attributes and the comprehensive evaluation score y into (9) and (10), the multiple determination coefficient = 0.947 can be obtained. In Table 1, taking the data of 69 original attributes and the comprehensive evaluation score into (9) and (10), the multiple determination coefficient = 1.000 can be obtained. Thus . It illustrates that the selected attributes reflect 94.7% original information with 27.54% attributes () by using the proposed attribute reduction model. And the experimental results show that the proposed reduction model is efficient and effective.

3.6. Some Notes about the Proposed Model

In Section 2.3.1, this paper takes entropy weight method as an example for the purpose of illustrating the feasibility and rationality of the proposed attribute reduction idea. As a matter of fact, the weight methods can be substituted in terms of the needs of decision makers. They can select other weight methods, such as AHP, G1, G2, and interval numbers weight approaches [47].

In Section 2.3.2, the paper takes linear regression model as an example so as to explain the feasibility of the proposed model. In reality, decision makers can select other nonlinear regression models [48].

4. Conclusions and Future Work

In order to reduce large dimensionality in complex data sets, we create an attribute reduction approach based on Pearson correlation analysis and test significance discrimination. First of all, we delete redundancy attributes using Pearson correlation coefficient, avoiding information chaos of the original attribute data sets. Secondly, developing attribute reduction methodology utilizing test significance discrimination can find the key attributes that have the greatest influence on the evaluation results. Thirdly, the paper also defines an information contribution ratio to assess the performance of attribute reduction model from a statistical viewpoint.

The proposed attribute reduction model has been verified utilizing the financing ability evaluation data of 713 small enterprises of a city commercial bank in China. The empirical evidence shows the accuracy and applicability of the proposed model. Moreover, we also establish an evaluation indicator system for small enterprise financing ability. It will help the downstream organizations of supply chain to choose more qualified partners and alleviate the difficulties faced by enterprises when applying loan. Furthermore, applications of the proposed model to real world data are expected in future.

It is well known that the problems of attribute reduction are ubiquitous in data mining activities. The empirical study in this paper is only an example in order to verify the accuracy of the proposed model. A topic of future research can be the application of the proposed approach to data sets in other attribute reduction areas. Researchers can easily conduct attribute reduction through cases and empirical studies.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of the paper.

Acknowledgments

The study was supported by the National Natural Science Foundation of China (nos. 71503199, 71471027, 71403215, 71373207, 71703122), the Key Project of National Natural Science Foundation of China (no. 71731003), the Project Funded by the China Postdoctoral Science Foundation (nos. 2016T90957, 2016M600209, 2015M572608), the Science and Technology Plan Project of Yangling Demonstration Zone (no. 2016RKX-04), and the Credit Rating and Loan Pricing Project for Small Enterprise of Bank of Dalian (no. 2012-01). Special thanks go to the Youth Talent Cultivation Program of Northwest A&F University (no. Z109021717). The first author would like to thank Professor Chunguang Bai from Dongbei University of Finance and Economics for her valuable comments and suggestions.

References

J. Manyika, M. Chui et al., Big Data: The Next Frontier for Innovation, Competition, And Productivity, vol. 4, McKinsey Global Institute, 2011.
C. Bai and J. Sarkis, “Determining and applying sustainable supplier key performance indicators,” Supply Chain Management Review, vol. 19, no. 3, pp. 275–291, 2014.
View at: Publisher Site | Google Scholar
B. Li, T. W. S. Chow, and P. Tang, “Analyzing rough set based attribute reductions by extension rule,” Neurocomputing, vol. 123, pp. 185–196, 2014.
View at: Publisher Site | Google Scholar
H. Li, G. Chen, T. Huang, and Z. Dong, “High-performance consensus control in networked systems with limited bandwidth communication and time-varying directed topologies,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2016.
View at: Publisher Site | Google Scholar
G. Chen, Z. Y. Dong, D. J. Hill, G. H. Zhang, and K. Q. Hua, “Attack structural vulnerability of power grids: a hybrid approach based on complex networks,” Physica A: Statistical Mechanics and its Applications, vol. 389, no. 3, pp. 595–603, 2010.
View at: Publisher Site | Google Scholar
S.-H. Teng, M. Lu, A.-F. Yang, J. Zhang, Y. Nian, and M. He, “Efficient attribute reduction from the viewpoint of discernibility,” Information Sciences, vol. 326, pp. 297–314, 2016.
View at: Publisher Site | Google Scholar
Z. Dong, M. Sun, and Y. Yang, “Fast algorithms of attribute reduction for covering decision systems with minimal elements in discernibility matrix,” International Journal of Machine Learning and Cybernetics, vol. 7, no. 2, pp. 297–310, 2016.
View at: Publisher Site | Google Scholar
Z. a. Pawlak, “Rough sets,” International Journal of Computer & Information Science, vol. 11, no. 5, pp. 341–356, 1982.
View at: Publisher Site | Google Scholar | MathSciNet
M. T. Rezvan, A. Z. Hamadani, and S. R. Hejazi, “An exact feature selection algorithm based on rough set theory,” Complexity, vol. 20, no. 5, pp. 50–62, 2015.
View at: Publisher Site | Google Scholar
B. Shi, J. Zhao, and J. Wang, “A credit rating attribute reduction approach based on pearson correlation analysis and fuzzy-rough sets,” ICIC Express Letters, vol. 10, no. 2, pp. 519–525, 2016.
View at: Google Scholar
C. Degang, W. Changzhong, and H. Qinghua, “A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets,” Information Sciences, vol. 177, no. 17, pp. 3500–3518, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
Q. Hu, Z. Xie, and D. Yu, “Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation,” Pattern Recognition, vol. 40, no. 12, pp. 3509–3521, 2007.
View at: Publisher Site | Google Scholar
X. Yang, J. Yang, C. Wu, and D. Yu, “Dominance-based rough set approach and knowledge reductions in incomplete ordered information system,” Information Sciences, vol. 178, no. 4, pp. 1219–1234, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
S. Greco, B. Matarazzo, R. Slowinski, and J. Stefanowski, “Variable consistency model of dominance-based rough sets approach,” in Rough Sets and Current Trends in Computing, W. Ziarko and Y. Yao, Eds., vol. 2005 of Lecture Notes in Computer Science, pp. 170–181, Springer, Berlin, Germany, 2001.
View at: Publisher Site | Google Scholar
M. Inuiguchi, Y. Yoshioka, and Y. Kusunoki, “Variable-precision dominance-based rough set approach and attribute reduction,” International Journal of Approximate Reasoning, vol. 50, no. 8, pp. 1199–1214, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
E. C. C. Tsang, C. Degang, and D. S. Yeung, “Approximations and reducts with covering generalized rough sets,” Computers & Mathematics with Applications, vol. 56, no. 1, pp. 279–289, 2008.
View at: Publisher Site | Google Scholar
C. Wang, M. Shao, B. Sun, and Q. Hu, “An improved attribute reduction scheme with covering based rough sets,” Applied Soft Computing, vol. 26, pp. 235–243, 2015.
View at: Publisher Site | Google Scholar
Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set based heterogeneous feature subset selection,” Information Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
A. M. Radzikowska and E. E. Kerre, “A comparative study of fuzzy rough sets,” Fuzzy Sets and Systems, vol. 126, no. 2, pp. 137–155, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
C. Bai, J. Sarkis, X. Wei, and L. Koh, “Evaluating ecological sustainable performance measures for supply chain management,” Supply Chain Management Review, vol. 17, no. 1, pp. 78–92, 2012.
View at: Publisher Site | Google Scholar
E. Zitzler and S. Künzli, “Indicator-based selection in multiobjective search,” in Parallel Problem Solving from Nature—PPSN VIII, vol. 3242 of Lecture Notes in Computer Science, pp. 832–842, Springer, Berlin, Germany, 2004.
View at: Publisher Site | Google Scholar
K. Polat and V. Krmac, “Determining of gas type in counter flow vortex tube using pairwise fisher score attribute reduction method,” International Journal of Refrigeration, vol. 34, no. 6, pp. 1372–1386, 2011.
View at: Publisher Site | Google Scholar
Y.-H. Ju and S.-Y. Sohn, “Updating a credit-scoring model based on new attributes without realization of actual data,” European Journal of Operational Research, vol. 234, no. 1, pp. 119–126, 2014.
View at: Publisher Site | Google Scholar
R. J. Elliott, T. K. Siu, and E. S. Fung, “A Double HMM approach to Altman Z-scores and credit ratings,” Expert Systems with Applications, vol. 41, no. 4, pp. 1553–1560, 2014.
View at: Publisher Site | Google Scholar
B. F. Shi, J. Wang, J. Y. Qi, and Y. Q. Cheng, “A novel imbalanced data classification approach based on logistic regression and Fisher discriminant,” Mathematical Problems in Engineering, vol. 2015, Article ID 945359, 12 pages, 2015.
View at: Publisher Site | Google Scholar
L. Li and J. Zhang, “Attribute reduction in fuzzy concept lattices based on the T implication,” Knowledge-Based Systems, vol. 23, no. 6, pp. 497–503, 2010.
View at: Publisher Site | Google Scholar
T.-J. Li, M.-Z. Li, and Y. Gao, “Attribute reduction of concept lattice based on irreducible elements,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 11, no. 6, Article ID 1350046, 2013.
View at: Publisher Site | Google Scholar
S. M. Dias and N. J. Vieira, “Concept lattices reduction: Definition, analysis and classification,” Expert Systems with Applications, vol. 42, no. 20, pp. 7084–7097, 2015.
View at: Publisher Site | Google Scholar
L. Wei, H.-R. Li, and W.-X. Zhang, “Knowledge reduction based on the equivalence relations defined on attribute set and its power set,” Information Sciences, vol. 177, no. 15, pp. 3178–3185, 2007.
View at: Publisher Site | Google Scholar
F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive attribute reduction,” Information Sciences, vol. 181, no. 22, pp. 4928–4942, 2011.
View at: Publisher Site | Google Scholar
G.-T. Chi, T.-T. Cao, and K. Zhang, “The establishment of human all-around development evaluation indicators system based on correlation-principle component analysis,” Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, vol. 32, no. 1, pp. 111–119, 2012.
View at: Google Scholar
Z. Xu, H. Zhao, F. Min, and W. Zhu, “Ant colony optimization with three stages for independent test cost attribute reduction,” Mathematical Problems in Engineering, vol. 2013, Article ID 510167, 2013.
View at: Publisher Site | Google Scholar
B.-F. Shi and G.-T. Chi, “Green industry evaluation indicators screening model based on the maximum information content and its application,” Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, vol. 34, no. 7, pp. 1799–1810, 2014.
View at: Google Scholar
L. Li, J. Mi, and B. Xie, “Attribute reduction based on maximal rules in decision formal context,” International Journal of Computational Intelligence Systems, vol. 7, no. 6, pp. 1044–1053, 2014.
View at: Publisher Site | Google Scholar
B. Shi, H. Yang, J. Wang, and J. Zhao, “City green economy evaluation: empirical evidence from 15 sub-provincial cities in China,” Sustainability , vol. 8, no. 6, article 551, 2016.
View at: Publisher Site | Google Scholar
K. Pearson, “Note on regression and inheritance in the case of two parents,” Proceedings of The Royal Society of London, vol. 58, pp. 240–242, 1895.
View at: Publisher Site | Google Scholar
Pearson Correlation Coefficients, “Pearson correlation coefficient,” http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient.
View at: Google Scholar
A. Buda and A. Jarynowski, “Life time of correlations and its applications,” Wydawnictwo Niezalezne, p. 21, 2010.
View at: Google Scholar
J. Cohen, Statistical power analysis for the behavioral sciences, Routledge Academic, 2013.
X. Meng, R. Rosenthal, and D. B. Rubin, “Comparing correlated correlation coefficients,” Psychological Bulletin, vol. 111, no. 1, pp. 172–175, 1992.
View at: Publisher Site | Google Scholar
X.-M. Zhang, G.-L. Diao, Y.-P. Zhao, W.-M. Wang, and P.-Y. Shu, “Study on mantle shear wave velocity structures in North China,” Chinese Journal of Geophysics (Acta Geophysica Sinica), vol. 49, no. 6, pp. 1709–1719, 2006.
View at: Google Scholar
Y. Liu, L. Zou, Y. Sun, and X. Yang, “Evaluation Model of Aluminum Alloy Welded Joint Low-Cycle Fatigue Data Based on Information Entropy,” Entropy, vol. 19, no. 1, p. 37, 2017.
View at: Publisher Site | Google Scholar
M. Lonnie, “R² measures based on Wald and likelihood ratio joint significance tests,” The American Statistician, vol. 44, no. 3, pp. 250–253, 1990.
View at: Publisher Site | Google Scholar
L. H. Herbach, “Properties of model II - type analysis of variance tests, A: Optimum nature of the F-test for model II in the balanced case,” Annals of Mathematical Statistics, vol. 30, pp. 939–959, 1959.
View at: Publisher Site | Google Scholar | MathSciNet
E. Maris and R. Oostenveld, “Nonparametric statistical testing of EEG- and MEG-data,” Journal of Neuroscience Methods, vol. 164, no. 1, pp. 177–190, 2007.
View at: Publisher Site | Google Scholar
Bank of Dalian (DB), Small Enterprise Credit Rating System for Bank of Dalian, Bank of Dalian, 2016.
B. Meng and G. Chi, “Evaluation index system of green industry based on maximum information content,” The Singapore Economic Review, pp. 1–20, 2016.
View at: Publisher Site | Google Scholar
W. Dillon and M. Goldstein, Multivariate analysis: Methods and applications, Wiley Press, New York, NY, USA, 1984.

Copyright

Copyright © 2018 Baofeng Shi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1568

Downloads

1687

Citations