Establishment of the Credit Indicator System of Micro Enterprises Based on Support Vector Machine and R-Type Clustering

Li, Zhanjiang; Yang, Chengrong

doi:https://doi.org/10.1155/2018/6390720

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 6390720 | https://doi.org/10.1155/2018/6390720

Establishment of the Credit Indicator System of Micro Enterprises Based on Support Vector Machine and R-Type Clustering

Zhanjiang Li¹and Chengrong Yang¹

Academic Editor: Marco Mussetta

Received17 Aug 2017

Revised06 Feb 2018

Accepted20 Feb 2018

Published08 Apr 2018

Abstract

The micro enterprises’ credit indicators with credit identification ability are selected by the two classification models of Support Vector Machine for the first round of indicator selection and then for the second round of indicator selection, deleting credit indicators with redundant information by clustering variables through the principle of minimum sum of deviation squares. This paper provides a screening model for credit evaluation indicators of micro enterprises and uses credit data of 860 micro enterprises samples in Inner Mongolia in western China for application analysis. The test results show that, first, the constructed final micro enterprises’ credit indicator system is in line with the 5C model; second, the validity test based on the ROC (Receiver Operating Characteristic) curve reveals that each of the screened credit evaluation indicators is valid.

1. Introduction

The large number of micro enterprises plays an irreplaceable role in promoting economic growth and the settlement of China’s social employment and people’s livelihood. But the financing difficulty of micro enterprises is becoming increasingly prominent, which seriously inhibit enterprises’ healthy development, so constructing a scientific credit evaluation indicator system for micro enterprises to help measure the credit risk of micro enterprises, help solve the problem of financing, and then promote enterprises’ healthy development becomes an urgent problem to be solved.

For current status of foreign research, SBSS (Small Business Scoring Service) is a credit evaluation model of micro enterprises created by Fair Isaac Corporation (USA), which is constructed by the methods of mathematical statistics and historical data analysis. SOHO (Small Office Home Office) model, a credit evaluation method for micro enterprises established by the Yachiyo Bank of Japan, mainly focuses on the analysis of qualitative nonfinancial indicators. The evaluation model of the CRD (Credit Risk Database) Operations Agreement uses a way to rate each of the negative aspects of micro enterprises and financing. By virtue of its corporate asset credit database and investigators, the Imperial Data Bank determines whether to lend to a micro enterprise through field interviews, visits, and indirect surveys. The micro enterprise credit indicators designed by India’s credit evaluation company, SMERA company, include 6 aspects, which conducts a different benchmark for enterprises of different industries and different registered capital size.

For current status of domestic research, Zhanjiang [1] selected micro enterprise credit indicators through the Brown-Mood median test, Moses variance test, and the Kendall rank correlation test. Guotai et al. [2] selected the indicator system according to the ability of an evaluation indicator to discriminate an enterprise’s credit status based on probit regression. Zhang et al. [3] studied the comprehensive evaluation indicator system of low-carbon road transport by using analytic hierarchy process and the method of Delphi fuzzy evaluation. Honghai [4] selected the indicators that contain more information and lower degree of redundant information according to relative discrete coefficient, Pearson’s correlation coefficient, and cumulative information contribution rate criteria. Youxi [5] selected indicators by combining the chi-square test, - test, and test after the initial construction of the indicator system.

There are shortcomings in the previous studies: Firstly, the enterprise’s credit evaluation is mainly focused on the large enterprises, for micro enterprise credit evaluation research is lacking. Secondly, the indicators that have been screened out cannot be guaranteed to significantly identify the micro enterprises’ credit status, which leads to a higher false-positive rate in the final enterprise credit evaluation results. Thirdly, there are information redundancy indicators in the final credit evaluation indicator system; that is, the selection of indicators does not consider eliminating repeated information indicators.

In this paper, we first select indicators that can identify the credit status of micro enterprises based on SVM (Support Vector Machine) and then construct an indicator system by deleting the indicators with redundant information and retaining the indicators with strong ability of credit identification based on R-type clustering, which makes the selected credit indicators be able to significantly identify the credit status of micro enterprises and do not have duplicate information and finally apply the constructed model to the credit data of micro enterprises in Inner Mongolia in western China.

The innovation of this paper lies in the following: the nonlinearity of the credit indicator is mapped to the high-dimensional space by the Gaussian kernel SVM and then the evaluation indicators are filtered out with credit identification ability, which solves the problem that the traditional linear weighting model cannot reflect the nonlinear relationship between the credit indicator and the evaluation results. Then we use Levene’s variance homogeneity test statistic, value, to recognize the credit identification ability of indicator and then cluster clusters using the method of R-type hierarchical clustering within the criterion layer and keep the indicators with largest value in each cluster, both deleting the redundant information indicators and retaining the indicators with significant credit identification ability.

2. Difficulties and Ideas of the Problem

2.1. Difficulties of the Problem

Difficulty 1. The first difficulty is how to ensure that each micro enterprise credit indicator that has been selected has the ability to identify the micro enterprises’ credit status. The commonly used indicators in credit evaluation do not necessarily have significant credit capabilities in micro enterprise credit ratings. In order to prevent companies with high default risk from obtaining a higher credit score, it is necessary to ensure that the selected indicators have the ability to identify the credit status of micro enterprises.

Difficulty 2. The second difficulty is how to avoid the situation where micro enterprise credit indicators reflect repeated information and how to ensure not mistakenly deleting the indicators with strong ability to identify the micro enterprises’ credit status when eliminating the redundant information indicators. A good micro enterprise credit indicator system must not contain redundant information indicators; each indicator in the final construction of the micro enterprise credit indicator model having significant credit identification ability is essential for micro enterprise credit evaluation, Therefore, in the process of constructing the micro enterprise credit indicator model, in addition to avoiding overlap information in credit indicators, retaining the indicators with significant credit identification ability on micro enterprise credit status is more important.

2.2. Ideas to Solve the Difficulties

(1) Ideas to Solve Difficulty 1. Credit identification ability of a credit indicator is the correct percentage to identify the credit status of micro enterprises. In this paper, we obtain the credit identification ability of all the indicators, , and the credit identification ability of the remaining indicator after deleting the indicator, , by predicting the credit status of micro enterprises and using the two classification models of SVM; the difference between and is defined as , which has been taken as the impact of the indicator on the evaluation results.

Remove or retain the indicator according to positive or negative and then filter out the credit evaluation indicator. Specifically, if is greater than or equal to 0, the credit identification ability of the remaining indicator after deleting the indicator is stronger than or equal to the credit identification ability of all the indicators when the indicator is not deleted, which indicates that the indicator cannot identify the enterprises’ credit status and so just deletes it. If is less than 0, the credit identification ability of the remaining indicators after the deletion of the indicator is weaker than the credit identification ability of all the indicators when the indicator is not deleted, which indicates that the indicator can identify the credit status of the micro enterprise and so just retains it. The ideas to solve difficulty 1 are shown in Figure 1.

(2) Ideas to Solve Difficulty 2. After R-type clustering, indicators in the same category are considered to reflect similar information and indicators in different categories are considered to reflect different information. In this paper, the R-type hierarchical clustering method is used to cluster the indicators of the same criterion layer which reflect the same type of information according to the principle of the minimum sum of deviation squares in order to cluster the indicators that reflect the repetitive information into one cluster through retaining the indicator with strongest credit identification ability in the indicators of same cluster and deleting all other indicators of the cluster to achieve the goal of preserving the indicators with strong credit qualification ability and at the same time deleting the indicators that reflect redundancy information. The variance homogeneity Levene’s test statistic value (hereinafter referred to as value) is used to measure the credit qualification ability of credit indicator. The value reflects the thought that the greater the degree of deviation from the mean value of credit indicator in default enterprise samples to the mean value of all enterprise samples, the stronger the ability of the indicators to significantly identify the micro enterprises’ credit status. The ideas to solve difficulty 2 are shown in Figure 2.

2.3. Principle of Building the Model

The principle of building the credit indicator model of micro enterprise based on the methods of SVM and R-type clustering is shown in Figure 3.

3. Method of Building the Model

3.1. Initial Selection and Standardization of Credit Indicators

There are two principles in the mass selection of indicators: retaining classic and high-frequency indicators and reflecting the characteristics of micro enterprises. Directly delete unobservable indicators or indicators with inability to obtain data or loss of original data of more than 10% of the total sample. Interpolation is used to process data that has lost less than 10% of the total number of samples.

Set as the standardized value of the indicator of the enterprise, as the original value of the indicator of the enterprise, as the total number of micro enterprises samples, as the left border of the indicator’s interval, and as the right border of the indicator’s interval.

Then the standardized value of positive indicator, , is

Then the standardized value of negative indicator, , is

Then the standardized value of interval indicator, , is

The standardization rules for qualitative indicators are shown in Table 1.

3.2. The Method of the First Round of Indicator Selection Based on SVM

(1) The Determination of Kernel Function. In this paper, the Gaussian radial basis function is selected as the kernel function of the SVM in the first round of indicator selection using the method of classification prediction of SVM; there are three main reasons: Firstly, linear kernel function is suitable for linearly separable situations, whereas the Gaussian radial basis function is suitable for linearly inseparable situations; for the nonlinear relationship between credit indicators and evaluation results, Gaussian radial basis function can get more accurate results than linear kernel function. Secondly, the number of parameters in the kernel function will affect the accuracy of the model. Kernel functions with fewer parameters help to improve the accuracy of the model compared to other kernel functions. the Gaussian radial basis function has fewer parameters. Thirdly, the use of Gaussian radial basis function as SVM’s kernel function also reduces the difficulty of the calculation.

(2) The Criteria of Selection

Criterion 1. , and > A, indicating that the credit identification ability of the remaining indicators after deleting the indicator is stronger than the credit identification ability of all the indicators when the indicator is not deleted; the indicator cannot identify default enterprises and nondefault enterprises to be deleted.

Criterion 2. , and is = A, indicating that the credit identification ability of the remaining indicators after deleting the indicator is equal to the credit identification ability of all the indicators when the indicator is not deleted; the indicator cannot identify default enterprises and nondefault enterprises to be deleted.

Criterion 3. , and is < A, indicating that the credit identification ability of the remaining indicators after deleting the indicator is weaker than the credit identification ability of all the indicators when the indicator is not deleted; the indicator can identify default enterprises and nondefault enterprises to be kept.

(3) Calculation of Credit Identification Ability of Credit Indicator. Set A as the credit identification ability of all the indicators for all micro enterprise samples, as total number of nondefault enterprises, as the true value of the default status of the enterprise (: the true value of the default status of the enterprise is nondefault; : the true value of the default status of the enterprise is default), as the predictive value of the default status of the enterprise, and as total number of default enterprises. Then is given as follows:

In this paper, is a formula obtained by replacing in the molecule of formula (4) with (the predictive value of credit status of the enterprise calculated by the indicators remained after deleting the indicator); then obtain (the credit identification ability of the indicators after deleting the indicator for all micro enterprises samples).

Then is given as follows:

3.3. The Method of the Second Round of Indicator Selection Based on R-Type Clustering

(1) The Criteria of Selection. After the first round of indicator selection, clustering the indicators inside the same criteria layer according to the principle of minimum deviation sum of squares using the method of hierarchical clustering through the R-type clustering, the validity of the number of clusters, L, is verified by the - test when the total number of clusters reaches the preset value, L; if the - test is not passed, then reset the number of clusters; if the - test is passed, then retain the indicator with the strongest ability of credit identification and delete redundant information indicators by retaining the indicator of the largest value in each cluster and deleting all the other indicators.

(2) The Calculation of Deviation Sum of Squares. Set as the sum of the squares of the criterion layer, as the number of clusters in the criterion layer, as the number of indicators of the cluster of the criterion layer, as the vector of the indicator in the cluster of the criterion layer, and as the mean vector of all the indicators in the class of the criterion layer. Then is given as follows:

(3) Test. In this paper, the nonparametric - test is used to test the rationality of the number of clusters, that is, to test whether there is a significant difference between the credit indicators of the same cluster. If the - test is not passed, which indicates that there is significant difference between these indicators of the same cluster, they cannot be clustered into a cluster; in this case, the number of clusters needs to be reset; if the - test is passed, which indicates that there is no significant difference between these indicators of the same cluster, they can be clustered into a cluster; in this case, retain the indicator with the strongest ability of credit identification and delete information redundancy indicators by retaining the indicator of the largest value in each cluster and deleting all the other indicators to complete the second round of indicators selection.

Specifically, the - test is as follows: H₀: there is no significant difference between the indicators within the cluster. H₁: there is significant difference between the indicators within the cluster. The significance level is set to 0.01. When sig. > 0.01, accept H₀, so these indicators can be clustered into a cluster. When sig. < 0.01, refuse H₀, so these indicators cannot be clustered into a cluster.

(4) The Calculation of Value. Set as the value of the indicator, as total number of enterprise samples, as total number of nondefault enterprise samples, as the absolute value of the difference between the indicator of the nondefault enterprise and the mean value of the indicator of all nondefault enterprises, as the absolute value of the difference between the indicator of the default enterprise and the mean value of the indicator of all default enterprises, and as total number of default enterprises. Then the indicator’s value, , is given as follows:

3.4. The Validity Test of Credit Indicators

The ROC curve is a comprehensive indicator that reflects the sensitivity and specificity of continuous variables; the vertical coordinate of ROC curve, sensitivity, indicates the ratio at which the default samples are judged to be correct; the specificity indicates the ratio at which nondefault samples are judged to be correct, so the horizontal coordinate of ROC curve, 1 − specificity, indicates the rate at which nondefault samples are judged to be incorrect. When the horizontal coordinate is constant, the larger the vertical coordinate is, the higher the proportion of default samples judged to be correct is, the larger the AUC (area under ROC curve) of the corresponding credit indicator is, the stronger the ability of credit identification of the indicator against the default samples is, and the more effective the indicator is. Based on the ROC curve, this paper tests the validity of the screened indicators; the criteria for indicator to define whether it has the accuracy to identify the credit status of enterprises samples are as follows: when , it does not have the accuracy of identification; when , it has the accuracy of identification.

4. The Application of the Model

4.1. Data and Standardization of Indicators

We apply micro enterprise credit data from 2010 to 2015 from a commercial bank in Inner Mongolia, western China. The 68 indicators remaining after the initial selection are shown in column (b) numbered from 1 to 68 in Table 2; the standardized value of each indicator is obtained by instituting the raw data and materials of indicators into Formulas (1)–(3a), (3b), and (3c) and Table 1 according to the type of indicators, among which the standardization values of qualitative indicators are determined by the professionals of universities and community. The micro enterprises’ true credit status (0 representing nondefault and 1 representing default) and other related information are shown in Table 2.

4.2. The First Round of Indicator Selection Based on SVM

(1) Classification of Micro Enterprise Samples. The 68 indicators of micro enterprises remaining after the initial selection for the first round are filtered using classification and prediction of SVM so as to pick out the indicators that can identify the credit status of micro enterprises. The division of micro enterprise samples is shown in Table 3.

(2) Determination of Optimal Parameters. It is necessary to determine the penalty coefficient and the Gaussian radial basis function parameter by using SVM’s classification and prediction to calculate in formula (4) and after deleting the indicator. MATLAB software and LIBSVM toolbox are used to determine the penalty coefficient, c, and the Gaussian radial basis function parameter, g; c is selected in steps of 0.5 between 2⁻⁴ and 2⁶ and is selected in steps of 0.5 between 2⁻⁵ and 2⁵; the cross validation number is set to 3-fold; the accuracy rate discretization display step is set to 0.9; then the program is run in MATLAB based on the parameters that have been set and the training set and test set that have been selected according to Table 2, columns ; then we have that the optimal Gaussian radial basis function parameter is 5.6569 and the optimal penalty coefficient is 0.125.

(3) Calculation of the Degree of the Influence of Credit Indicator on Evaluation Results. The training model is established on MATLAB using the selected training set and the optimal parameters and ; the value of in column is obtained by predicting the credit status of the enterprise in the test set. Delete the indicator in the training set and test set at the same time; establish the training model in MATLAB based on the optimal parameters and using the training set that has removed the indicator; the value of can be obtained by predicting the credit status of the enterprise in the test set where the indicator has been deleted. The values shown at Table 4, column , are obtained by substituting the two credit status predictive values obtained above and the credit status true values shown at Table 2, 69th row, into formula (4); the degree of the influence of each credit indicator on evaluation results, or shown at Table 4, column , is obtained by substituting the values shown at Table 4, column , into formula (5).

(4) The First Round of Credit Indicator Selection. shown at Table 4, column , represents the degree of the influence of the credit indicator on evaluation results; the selection results obtained according to the first round of indicator selection criteria are shown in column of Table 4, where “Delete” indicates that the corresponding credit indicator is deleted and “Retain” indicates that the corresponding credit indicator is retained in the first round of indicator selection based on the SVM.

After the first round of indicator selection, we delete 25 indicators and keep 43 indicators that can identify the credit status of micro enterprises.

4.3. The Second Round of Indicator Selection Based on R-Type Clustering

The second round of indicator selection for the 43 indicators remaining after the first round of indicator selection based on R-type clustering is to filter out the indicators with strong ability of credit qualification and delete the redundant information indicators.

(1) Determine the Number of Clusters in Each Criterion Layer. Calculate the number of clusters in each criteria layer according to the fact that there will be 20 credit indicators retained in the final indicator model; specifically; there are 43 indicators remaining after the first round of indicator selection, where there are 18 indicators remaining from the first criterion layer, the internal financial factors, and they would be divided into clusters. There are 20 indicators remaining from the second criterion layer, the internal nonfinancial factors, keeping the indicator-collateral score directly in order to correspond to 5C factor analysis model and treat it alone as a cluster; the remaining 19 indicators are divided into clusters. There are 5 indicators remaining from the third criterion layer, external macro environmental factors, and they would be divided into clusters.

(2) Clustering the Indicators within the Criteria Layer. The indicators in the first criterion layer, financial internal factors, are used as an example for clustering; the other two criterion layers do similar processing.

Firstly, make all the indicators marked “Retain” in Table 4, column , numbers 1–35, from the first criteria layer, the internal financial factors, into a cluster, respectively, which is formed into 18 clusters; then cluster any two clusters of indicators into a new cluster, which clusters the indicators within the first criteria layer into 17 clusters, adding up to = 153 clustering schemes. Substitute the standardized values of the indicators of each clustering scheme into formula (6) to calculate each clustering scheme’s deviation squared sum; the clustering scheme with the smallest deviation squared sum is chosen and then the first round of clustering is completed. Continue clustering in this way until the number of clusters in the first criteria layer reaches the preset quantity, 8. The clustering results of all the indicators are shown in Table 4, column .

(3) The Test of the Rationality of the Number of Clusters. Cluster the 43 credit indicators marked “Retain” shown in Table 4, column , within the criteria layer based on R-type hierarchical clustering according to the principle of minimum sum of deviation squares and the clustering results are shown in Table 5, column ; in order to avoid some of the indicators misinterpreted in the second round of R-type clustering because of the significant difference between the evaluation indicators within the cluster, in this paper, use the method of - test in SAS software for the clustered credit indicators to complete the significant test at a significance level of 0.01 (except for the cluster with only one indicator) and the - test sig. values for each cluster are shown in Table 5, column , according to the criterion of test: 20 clusters of indicators are clustered as reasonable, so there is no need to reset the number of clusters.

(4) The Calculation of Value. Substitute the standardized values of the 43 indicators marked "Retain" in column (3) of Table 4 into formula (7) to calculate the values of the 43 indicators and they are shown in Table 5, column .

(5) The Second Round of Credit Indicator Selection. The second round of indicator selection is achieved by keeping the indicator of the largest value in each cluster according to the clustering results shown in Table 5, column ; the results of selection are shown in Table 5, column , in which the indicators marked "Delete" are deleted and the indicators marked "Retain" are retained in the second round of indicator selection based on R-type clustering.

After the second round of indicator selection, we delete 23 indicators and keep 20 indicators that can significantly identify the credit status of enterprises and do not contain redundant information indicators.

4.4. Contrast with the 5C Model

Comparatively analyze the constructed micro enterprises’ credit indicator model and 5C factor analysis model; the results are shown in column of Table 6, in which legal representative’s loan default records and four other evaluation indicators reflect the moral quality of the 5C elements; the cash recovery rate of all assets and 11 other evaluation indicators reflect the repayment ability of the 5C elements; the fixed rate of capital and 2 other evaluation indicators reflect the capital strength of the 5C elements; the collateral score reflects the secured collateral of the 5C elements; the industry sentiment indicator and the Engel coefficient reflect the operating environment conditions of the 5C elements.

4.5. The Validity Test of Credit Indicators and the Final Indicator System

After the pretreatment of micro enterprise’ credit indicator and two rounds of indicator selection, the paper constructs a credit indicator system of micro enterprises with 20 credit indicators shown in column (b) of Table 5.

For the standardized values of the 20 credit indicators shown in column (b) of Table 6 remaining after the final selection, use the ROC curve in SPSS software to test the validity of the indicators in the constructed micro enterprise credit indicator system; the ROC curve of each indicator is shown in Figure 4; the AUC of each indicator is shown in column of Table 6. As shown in column of Table 6, the AUC values of the 20 credit indicators remaining after the final selection are all greater than the critical value of 0.5; as shown in column of Table 5, the results of the validity test of the credit indicators show that all the indicators remained after the final selection has passed the validity test.

5. Conclusions

In this paper, the micro enterprise credit indicator model is constructed through the double combination selection model based on SVM and R-type clustering, where internal financial factors, nonfinancial factors, and external macro environmental factors are criteria layers and the cash recovery rate of all assets and 20 other credit indicators are indicators layers.

Compared with the 5C element model, the results show that, in this paper, all the credit indicators of the micro enterprise credit indicator model can be related to the elements in the 5C element model, so the information of the constructed micro enterprise credit indicator model covers all the elements of the 5C element model.

The results of the validity test of the credit indicators of micro enterprise based on ROC curve show that all the credit indicators of the micro enterprise credit indicator model constructed in this paper pass the validity test, so all the indicators in the micro enterprise credit indicator model are valid.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the Key Project of National Natural Science Foundation of China (71731003), China Postdoctoral Science Foundation (2015M582746XB), and Natural Science Foundation of Inner Mongolia Autonomous Region of China (2016MS0714).

References

L. Zhanjiang, “Establishment of Evaluation indicator System of Credit State of Micro Enterprises,” Technology Economics, vol. 36, no. 02, pp. 109–116, 2017.
View at: Google Scholar
C. Guotai, Z. Yajing, and S. Baofeng, “The Debt Rating For Small Enterprises Based on Probit Regression,” Journal of Management Sciences in China, vol. 19, pp. 136–156, 2016.
View at: Google Scholar
W. Zhang, J. Lu, and Y. Zhang, “Comprehensive Evaluation Index System of Low Carbon Road Transport Based on Fuzzy Evaluation Method,” in Proceedings of the Green Intelligent Transportation System and Safety, GITSS 2015, pp. 659–668, China.
View at: Publisher Site | Google Scholar
C. Honghai, “Study of Evaluation Indicators Screening Based on Information Substitutability,” Statistics & Information Forum, vol. 31, no. 10, pp. 17–22, 2016.
View at: Google Scholar
L. Youxi, “A Summary of Comprehensive Evaluation Methods,” Market Modernization, vol. 02, pp. 254-255, 2016.
View at: Google Scholar

Copyright

Copyright © 2018 Zhanjiang Li and Chengrong Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

670

Downloads

592

Citations