Abstract

In China, small enterprises have a direct role in economic growth, but they have difficulty in financing development. To address this problem, this paper creates a small business credit evaluation index using a two-stage Bayesian discriminant model. In the first stage, customers are distinguished by whether they are in default, and in the second stage, customers with continuing default are divided into those with a high default loss rate and those with a low default loss rate. The literature to date has identified a credit index only for the first stage; the credit evaluation index proposed here is based on two stages, which is more sensitive. Then, we conduct an empirical analysis using credit data on 3,111 small enterprises in China with a two-stage nonparametric Bayesian discriminant model and a parametric discriminant model, and then, we test the two indicator systems with discriminant accuracy and an ROC curve; the discriminant accuracy of the established index system is 77.95% and 70.95%, respectively, and their prediction accuracy is 0.902 and 0.866, respectively; they show that the constructed indicator system is robust and effective. Finally, we conduct a comparative analysis of discriminant accuracy in three models, finding that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor.

1. Introduction

Small business is one of the most active economic parts of the Chinese economy, but its development has been limited to some extent by its difficulty in obtaining financing. To reduce risk, commercial banks in China often extend loans with mortgage guarantees. Small enterprises themselves financial system is not standardized, their financial information is not perfect, and it is difficult for them to provide mortgage guarantee. For these and other reasons, they face problems in obtaining bank loans. Many papers on small business credit evaluation address the problems with small business loans [1, 2]. At the same time, the issue facing banks is how to determine which factors explain the credit status of small enterprises. The number of such credit indicators for small enterprises are so numerous that a problem of information duplication arises. They need a way to evaluate enterprise credit that is both more sensitive and less quantitatively intensive. Therefore, this paper proposes a two-stage discriminant method for the selection of credit evaluation indicators to construct a scientific and complete indicator system that can distinguish the state of default of small enterprises. In addition, we believe that the two-stage credit evaluation index screening model can build a more sensitive credit evaluation index system, which can provide a reference for banks to conduct a scientific evaluation and credit evaluation of small enterprises.

Most existing systems for evaluating credit discriminate between these factors based only on whether enterprises are in default [3, 4], rather than the level of loss from that default. Credit index discrimination cannot fully reflect the change of the default loss only based on whether the default occurs or not. Different default losses, such as partial default and total default, can also significantly affect the credit status of small enterprises. However, most people ignore the different credit characteristics of high default loss and low default loss. Moreover, the indicators in the credit evaluation index system must be those that can identify different credit states and have high discriminability. Based on the construction principle of the credit evaluation index system, we dig into the information that can identify the credit status from the perspective of two stages and then select the indicators. Therefore, in this paper, we propose a system for evaluating credit with two stages of discrimination, two-stage credit evaluation index discrimination means that the customers are divided into default and nondefault for the first stage of discrimination, and the defaulting customers are further divided into customers with high default loss rate and customers with low default loss rate for the second stage of discrimination. The credit evaluation index system is more sensitive because it uses a broader selection of credit indicators, and it is of great economic significance to the selection of credit indicators and the construction of credit evaluation index system in the future. To construct this system for evaluating small business credit, we use nonparametric and parametric Bayesian discrimination and propose employing combination screening based on Bayesian discrimination and cluster analysis, which can reduce the number of indicators that need to be included, thereby creating a more feasible evaluation method. Our sample comprises credit data on 3,111 Chinese small enterprises. We test the models with discriminant accuracy and an ROC curve, the discriminant accuracy of the index system constructed by the two-stage nonparametric Bayesian discriminant model and the two-stage parameter Bayesian discriminant model is 77.95% and 70.95%, respectively; their prediction accuracy is 0.902 and 0.866, respectively; they show that the constructed index system is robust and effective. Then, comparative analysis of discriminant accuracy of three models shows that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor. We can see that the two-stage nonparametric Bayesian discriminant model has a stronger sensitivity and higher ability of default discrimination, which can be applied in practice, and opens up a new way of two-stage discriminant in the construction of credit evaluation index system in the future. Moreover, aiming at the massive indicators of big data, this study can build a more sensitive indicator system with a more compact number of indicators, which has a very good application value for the selection of credit indicators. This is a point that needs special emphasis.

Small enterprises are of key importance in the Chinese economy, and traditional systems used for credit evaluation are typically based on the international 5C principles (character, capital, capacity, collateral, and business condition). Standard & Poor’s credit rating is mainly based on business and financial indicators. The US credit rating agency, Moody’s, evaluates enterprises based on their capital structure, sales growth, and other aspects. The Fitch rating agency mainly evaluates credit for enterprises based on their structure, corporate profitability, and corporate strategy. When the Industrial and Commercial Bank of China (ICBC) evaluates the creditworthiness of enterprises, it considers shareholders, economic conditions, development prospects, solvency, and other aspects. The China Construction Bank (CCB) uses a credit evaluation system that examines a small enterprise’s financial risk, account behavior, operating environment, operating status, development potential, number of personnel, credit standing, and other indicators. The rating objects of Standard & Poor’s, Moody’s, and Fitch’s mainly include bond rating, national sovereign rating, and listed company rating. Therefore, the credit evaluation index system they adopt is applicable to large and medium-sized enterprises, but not appropriate for small enterprises in China, which have imperfect financial information.

In addition, many papers have been written on the evaluation of small business credit. They mainly used parametric methods to evaluate credit for small enterprises. Boguslauskas et al. (2011) screened the evaluation indicators by discriminant analysis, logistic regression, and neural network [5]. Hammer et al. (2012) evaluated banks’ financial strength by screening indicators with a logistic regression [6]. Vega et al. (2013) used logistic regression to build a credit scoring model for Serbian companies [7]. Li et al. (2013) used a projection tracking method to screen credit evaluation indicators [8]. Karminsky and Khromova (2016) constructed a representative variable scale with a potential impact on scoring, based on a Bankscope database containing financial information on international banks from 1996 to 2011. The ordered probit model leads to the conclusion that macrovariables improve the explanatory power of the model [9]. Lv et al. (2017) used a model combining single-factor analysis and logistic regression not only select the indicators that have a significant influence on the personal credit but also calculate the influence degree of specific indicators on the borrower’s own credit status, that is, the weight of each index [10]. Dyatchkova et al. (2018) used the model method to study the relationship between the credit rating of BRICS industrial enterprises and financial indicators and evaluated the Tobit regression model of the selected group to establish some credit rating models for BRICS industrial companies [11]. Louzada et al. (2018) proposed a survival credit risk model that jointly adapts to three default times in bank loan portfolios and adopted the maximum likelihood estimation program for parametric estimation and Monte Carlo simulation to evaluate its limited sample performance [12]. Dai et al. (2018) proposed a personal credit assessment method based on partial least squares for the current credit problems of commercial banks and tested German credit data. The study showed that this method was simple, feasible, and effective [13]. Zhang and Chi (2018) introduced multiobjective planning to establish a credit rating model and conducted an empirical analysis based on data for 6,155 enterprises. The results showed that this method could ensure a balance between the two criteria, avoid excessive concentration of debtors in a specific rating, and contribute to the establishment of a reasonable credit rating system [14].

Later, considering the limitations of parametric methods, they gradually advanced to nonparametric and artificial intelligence methods. Traczynski (2017) used the Bayesian model average method to screen the default prediction indicators and established a default prediction indicator system, such as the ratio of total liabilities to total assets and the fluctuation of market returns. The advantage of this screening method is that the cross-model aggregation of information or impact on specific industries has a greater effect than a single model [15]. Bou-Hamad (2017) proposed a comprehensive framework based on the random forest (RF) method and the Bayesian model average (BMA) to study the importance of ordinal variables in credit risk assessment and default prediction [16]. Sun et al. (2018) proposed a new decision tree (DT) ensemble model for unbalanced enterprise credit evaluation based on integrated minority oversampling technology (hit) and differential sampling rate (DSR) bag-loaded integrated learning algorithm, which is called DT-SBD. The results showed that the DTE-SBD model is significantly better than the five models of pure DT  and oversampling DT  and has a positive effect on imbalances in enterprise credit evaluation [17]. Shi et al. (2018) found a new attribute reduction method and applied it in the evaluation of small business financing ability [18]. Du (2018) used the genetic backpropagation (BP) algorithm to optimize the connection weight and threshold of the neural network, which solved problems such as the slow convergence speed of the BP neural network. The research showed that the genetic neural network method could be used in an enterprise credit rating [19]. Hsu et al. (2018) proposed a new classification model based on the biological heuristic computing mechanism by combining the artificial swarm (ABC) method with support vector machine (SVM) technology based on an actual ten-year dataset extracted from the Compustat credit rating database, so as to improve the credit rating and credit rating change forecast [20]. Bai et al. (2019) proposed a new bank credit value assessment method combining fuzzy rough set and fuzzy c-means clustering, using rule-based method results to predict farmers’ reputations. The results show that education and skills are key factors in improving farmers’ reputations [21]. Of course, there are concerns about the impact of default losses on credit ratings; Shi et al. (2019) optimized the credit rating and provided guidance for the mismatch between credit ratings and loss given default (LGD) in the existing credit rating literature [22]. They divided the credit rating of enterprises mainly by the default loss rate but did not apply the default loss rate to the construction of the enterprise credit evaluation index system.

The existing literature focuses on the parametric screening of indicators, few studies have been conducted on the screening of credit evaluation indicators using nonparametric methods, and even fewer studies on the screening of credit evaluation indicators under parametric and nonparametric comparative analysis. However, in other disciplines, nonparametric methods have achieved remarkable results: Abad and Briec (2019) analyzed the concept of boron generation technology, introduced the b-one-time hypothesis, and gave an example of the new b-processing hypothesis for convex and nonconvex nonparametric technology [23].

In the existing research on discrimination in a credit evaluation index, it is a disadvantage to discriminate only whether a customer defaults [1416, 21]. On the one hand, the discrimination in a credit index does not reflect changes in default loss only in terms of a change in default status but in different types of default loss, such as partial default and total default and distinguishing between partial default and total default reflects the credit status of small enterprises more accurately. We consider the impact of different default losses on credit evaluation index screening, which is also the main contribution of this study. On the other hand, in the case of massive indicators, many indicators are selected by existing researches, some of which cannot distinguish the default status of small enterprises significantly, which does not fully reflect the credit status of small enterprises. In order to solve this problem, we conduct the first-stage indicator screening based on the default state, and then, conduct the second-stage indicator screening based on the default loss rate, which is the second contribution of this study. Moreover, most of the existing studies are based on parametric methods [514]; however, most of the evaluation indicators do not follow a normal distribution, and their distribution is unknown. Therefore, nonparametric methods are adopted in this study to screen the indicators, which is also the third contribution of this study. Based on the above, we construct two complete credit evaluation index systems for small enterprises based on a two-stage nonparametric Bayesian discriminant model and a two-stage parametric Bayesian discriminant model and determine how to construct an index system with greater ability to identify the default loss rate by comparing changes in discriminant accuracy after screening the index.

The article is organized as follows. Section 2 discusses the principles on which our models are constructed and explains the methodology used in the construction of the models, Section 3 details how the sample data are screened for use in the models, Section 4 compares our model to the traditional model, and Section 5 offers our conclusions.

2. Principles and Methods

2.1. Two-Stage Bayesian Discriminant Principle

By constructing a two-classified Bayesian discriminant model for default customers and nondefault customers, high default customers, and low default customers, we can determine the influence of the index on discriminant accuracy. Then, the identification ability of all samples of nonparametric Bayesian and parametric Bayesian is tested by comparing the accuracy of the discriminant of the default loss rate of the enterprise. The construction and comparison principles of the credit evaluation index system for small enterprises are illustrated in Figure 1.

2.2. Two-Stage Bayesian Discriminant Method
2.2.1. Two-Stage Nonparametric Bayesian Discriminant Method

(1) One-Stage Screening between Default and Nondefault. is the posterior probability of the sample from i; is the number of default enterprises; is the number of nondefault enterprises; x is the sample to be determined; is the prior probability of the sample from i; -is kernel density function of i; Bayesian discriminant function is as follows [24]:in which the posterior probability of the sample from i equals the ratio of the product of the prior probability of the sample from i and the kernel density function of i and the product of the prior probability of the sample from each population and the kernel density function of each population.

is the number of samples in I; then, the prior probability of the sample from i is as follows [24]:in which the prior probability of the sample from i equals the ratio of the sample number of i to the total sample number.

is window width; is the kernel density function of the population; is the sample j in i, and then, the kernel density function of i is as follows [25]:

Using this equation, with the data of the known sample and the selected kernel function and the bandwidth, we can estimate the distribution density function of the population, and we can use a cross-validation method to get a reasonable window width from the existing data without making any assumptions about the estimated density function.

By substituting the results calculated by equations (2)-(3) into (1), we can obtain the posterior probability of the samples from different populations, and the rule for judging which population the samples are from is as follows [26]:in which if , then the probability of the sample from is greater than that from , and then, the sample to be determined is part of the default sample. If , the probability that the sample comes from is less than the probability that the sample is part of , so the sample to be determined is a nondefault sample.

D is the number of default samples determined by nonparametric Bayesian discriminants; is the number of actual default samples. U is the number of nondefault samples determined by nonparametric Bayesian discriminant; is the number of actual nondefault sample. M is the discriminant accuracy of all samples. The equation is as follows [27]:in which the discriminant accuracy of all samples equals the arithmetic average of discriminant accuracy of default and nondefault samples. The greater the discriminant accuracy of all samples, the better the discriminant effect of the indicator system is.

Step 1. The normalized data on indicators of all samples are substituted into equations (1)–(5) so that the discriminant accuracy of indicators can be obtained.

Step 2. The first indicator is deleted, and the remaining indicators are substituted into equations (1)–(5) so that the discriminant accuracy of indicators can be obtained.

Step 3. All the indicators are deleted one by one to obtain the discriminant accuracy of indicators.

Step 4. is the degree of influence of the ith index on the discriminant accuracy of the default state, and the equation is as follows:in which the degree of influence of the ith index on the discriminant accuracy is the difference between the discriminant accuracy after deleting the ith index and the discriminant accuracy of all indicators, which reflects the importance of the ith index to discriminant accuracy in the indicator system.

Step 5. of all the indicators is calculated, and the retention or deletion of indicators is determined based on the relationship between and 0. If the discriminant accuracy of the ith index is greater than that of all indicators after its deletion, then the discriminant accuracy of the index system is improved after its deletion, so it should be deleted. If the discriminant accuracy after deleting the ith index equals the discriminant accuracy of all indicators, then the deletion of this index has no influence on the discriminant accuracy of the index system, so this index should be deleted. If the discriminant accuracy after deleting the ith index is less than that of all indicators, then the discriminant accuracy of the index system is lower after this index is deleted, so this index should be retained.

Step 6. is the proportion of the ith index to the degree of discriminant accuracy; is the absolute value of the ith index on the degree of accuracy of the discriminant, is the number of indicators in which is less than 0, and is the cumulative proportion of the first indicators of the degree of discriminant accuracy. The equations are as follows:where we select the indicators according to the criterion that the cumulative proportion of the influence degree on the discrimination accuracy is greater than 95%.
(2) Two-Stage Screening between High Default and Low Default. The nonparametric clustering of the indicators retained after the first-stage screening is carried out in the same criterion layer, and 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate.
In nonparametric clustering, the class is defined by the mode of the probability density function. is the ith index; is the number of indicators; is the number of indicators in the neighborhood of ; is the volume of the nearest neighbor of , in which the sphere that has the center of index is called the neighborhood of , and the index in the neighborhood is called the adjacent index of . The equation is as follows [28]:in which the estimated value of probability density is the number of indicators contained in the sphere centered at this point divided by the product of the total number of indicators and the volume of the sphere.

2.2.2. Two-Stage Parametric Bayesian Discriminant Method

(1) One-Stage Screening for Default and Nondefault. is the posterior probability of the sample from the ith population; is the population of default enterprises; is the population of nondefault enterprises; is the sample to be determined; is the prior probability of the sample from the ith population; is the density function of the ith population; the calculation of Bayesian discriminant function is exactly the same as formula (1). And the calculation of the prior probability is the same as above.

Each population that follows the normal distribution of the mean is , and the covariance matrix is ; the density function of the ith population is as follows [29]:

The determination of the Bayesian discriminant rule, the measurement of discriminant accuracy, and the specific steps in Bayesian discriminant index screening are the same as the nonparametric method mentioned earlier.

(2) Two-Stage Screening between High Default and Low Default. Sample ordered clustering is carried out within the same criterion layer for the indicators retained after the first stage screening, and 71 default samples are divided into subsamples with a high default loss rate and a low default loss rate.

Sample ordered clustering is based on the sum of the squares of deviations. is the sum of the squares of the deviation of the category . is the number of indicators in the ith category, is standardized data of the jth index in the ith category, is the mean of indicators in the ith category, and the equation for is as follows [30]:In equation (10), the best combination is determined by calculating the sum of the squares of the total deviations, that is, we should calculate the minimum value of equation (10).

The determination of Bayesian discriminant rules, the measurement of discriminant accuracy, and the specific steps in Bayesian discriminant index screening are the same as the screening methods for default and nondefault described earlier. The results of Bayesian discrimination and clustering can be obtained by SAS V9 software.

2.3. ROC Curve for Validity Test of the Index System

The purpose is to test the validity of parametric Bayesian discrimination and nonparametric Bayesian discrimination for index screening through the area under the curve (AUC) value of the receiver operating characteristic (ROC) curve. Because of the complexity of the calculation process, it can be obtained through SPSS 22 software.

The samples correctly judged as high default ( = 1) are recorded as TP (true positive); the high default samples misjudged as low default are denoted as FN (false negative); the samples correctly judged as low default ( = 0) are denoted as TN (true negative); the low default samples misjudged as high default are denoted as FP (false positive). Two indicators are required for ROC curve mapping: sensitivity and specificity, respectively, as follows:

3. Empirical Analysis

3.1. Samples and Data Sources

This empirical sample in the paper consists of data on loans for 3,111 small enterprises from the database at a Chinese commercial bank, of which 71 are in default and 3,040 are not. Based on the Small Business Credit Evaluation System constructed by foreign financial institutions, such as S & P and Moody’s, as well as Chinese financial institutions such as the ICBC and CCB, we select a total of 107 debt rating indicators divided into two primary criterion layers (repayment ability and repayment willingness) and seven secondary criterion layers (internal financial factors and nonfinancial factors). The internal financial factors comprise four criteria—solvency, profitability, operational capacity, and growth capacity—each of which has three levels. Because of the unavailability of some data, 26 indicators, such as economic environment and national policies, are excluded from the initial 107, leaving a total of 81 indicators, as shown in column 1 of rows 1 to 81 in Table 1.

3.2. Standardization of Index Data

We obtain standardized scores for each indicator according to standardized scoring methods for different indicators [31], and then, insert standardized data into the relevant rows in Table 1.

3.3. Constructing a Two-Stage Nonparametric Bayesian Discriminant Model
3.3.1. One-Stage Screening Method between Default and Nondefault

Taking the “solvency” of the three-level criterion layer as an example, this paper illustrates the specific process of nonparametric Bayesian discriminant screening to distinguish the indicators of enterprise default status. The 20 indicators of “solvency” in the three-level criterion layer are in column 1 of Table 2.

Using the nonparametric Bayesian discriminant method, we can obtain the default sample discrimination accuracy , nondefault sample discrimination accuracy , and all sample discrimination accuracy of 20 indicators. Finally, we calculate the influence degree of each index on the discrimination accuracy and select the index according to . The screening results are shown in Table 2.

We repeat this screening process for each criterion layer and thereby obtain the screening results of all indicators. The first stage of nonparametric Bayesian discrimination screens 81 indicators, leading 36 indicators to be retained and 45 indicators to be deleted.

3.3.2. Two-Stage Screening Method for High Default and Low Default

For the 36 indicators retained after the first-stage screening, the nonparametric clustering is carried out in the same criterion layer, 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate, and the second-stage screening is carried out through nonparametric Bayesian discrimination, and the index that can distinguish between a high and low default loss rate is selected. Nonparametric clustering results show that 50 of the default samples have a high default loss rate and 21 have a low default loss rate. Because the calculation process is complex, we use SAS software to perform it.

In the criterion layer for solvency, eight indicators in the three-level criterion layer are retained and two are deleted. The screening results are shown in Table 3. Finally, in the second stage of nonparametric Bayesian discrimination, 36 indicators are screened, leading to the deletion of 12 indicators and retention of 24.

3.4. Constructing a Two-Stage Parametric Bayesian Discriminant Model
3.4.1. One-Stage Screening Method for Default and Nondefault

Similarly, solvency in the three-level criterion layer is used as an example, using the parametric Bayesian discriminant method, 8 of the 20 indicators of “solvency” in the three-level criterion layer retained, and 12 are deleted. The screening results are shown in Table 4. Finally, in the first stage of parametric Bayesian discrimination, 81 indicators are screened, of which 50 are deleted and 31 are retained.

3.4.2. Two-Stage Method for Screening High Default and Low Default

Sample ordered clustering is carried out within the same criterion layer for the indicators retained after the first-stage screening, and 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate. The results show that 50 of the default samples have a high default loss rate, and the remaining 21 have a low default loss rate.

The solvency criterion layer has 20 indicators, of which 8 remain after the first-stage screening. The screening results are shown in Table 5. Finally, in the second stage of parametric Bayesian discrimination, 31 indicators are screened, of which 17 are deleted and 14 are retained.

3.5. Two-Stage Bayesian Discriminant Evaluation Index System
3.5.1. Two-Stage Nonparametric Bayesian Discriminant Evaluation Index System

By discriminating 81 indicators with the nonparametric Bayesian model, we can identify the indicators that can distinguish between default and nondefault and construct a one-stage evaluation index system to do so, and this index system mainly includes 36 indicators such as current liabilities net cash flow ratio of operating activities and working experience in the related industry.

On the basis of one stage, the nonparametric Bayesian model is used to discriminate the remaining 36 indicators, and the indicators that can significantly distinguish a high default loss rate from a low default loss rate are selected, and a two-stage credit evaluation index system is constructed. The results are shown in Table 6.

3.5.2. Two-Stage Parametric Bayesian Discriminant Evaluation Index System

By discriminating 81 indicators with a parametric Bayesian model, we identified the indicators that can distinguish between default and nondefault and construct a one-stage evaluation index system of default and nondefault, and this index system mainly includes 31 indicators such as quick ratio and product sales scope.

On the basis of one stage, the parametric Bayesian model is used to discriminate the remaining 31 indicators, and we identify the indicators that can significantly distinguish a high default loss rate and a low default loss rate and construct a two-stage credit evaluation index system. The results are shown in Table 7.

4. Results and Discussion

4.1. Testing Validity of the Index System

By selecting different critical values, multiple groups with different levels of confidentiality and specificity can be obtained, as shown in Figure 2. The ROC curve can be obtained by SPSS software.

In Figure 2, the PER-1 curve represents the identification results of the 14 indicators screened with parametric Bayesian discrimination between a high and low default loss rate, and the PER-2 curve represents the identification results of the 24 indicators screened with nonparametric Bayesian discrimination between a high and low default loss rate. AUC is 0.866 under the PER-1 curve and 0.902 under the PER-2 curve. The AUC area determined with nonparametric Bayesian discrimination is greater than that determined by parametric Bayesian discrimination. Therefore, the nonparametric Bayesian discrimination has a good effect on distinguishing a high and a low default loss rate, and the selected index system has a strong ability to distinguish high and low default loss rate.

4.2. Stability Test of the Index System

We randomly select 80% of the original data as the training set and 20% of the original data as the test set for three simple cross-validations; the verification results are shown in Table 8.

Since the training set and the test set are randomly selected, the number of samples with high and low default loss rates in the two-stage index screening is not exactly the same, and the final screened index system is not completely the same. However, the mean value of the three simple cross-validation shows that the discriminant accuracy of all samples of the nonparametric Bayesian discrimination model is 97.22%, that of all samples of the parametric Bayesian discrimination model is 56.90%, and that of the logistic regression model is 74.56%, which show that the two-stage credit evaluation index screening model is stable.

4.3. Comparing the Accuracy of Two-Stage Evaluation Index System

Table 9 shows that, after the nonparametric Bayesian discriminant screening, the discriminant accuracy of the index system, composed of 24 indicators, for all samples is 77.95%, among which the discriminant accuracy of samples with a high loss rate is 94.00% and that of samples with a low default loss rate is 61.90%. After parametric Bayesian discriminant screening, the discriminant accuracy of the index system composed of 14 indicators for all samples is 70.95%, including 80.00% for samples with a high default loss rate and 61.90% for samples with a low default loss rate. In this paper, we compare the two classifications logistic regression model with the Bayesian discriminant model. After logistic regression screening, the discriminant accuracy of the index system composed of 9 indicators for all samples is 71.57%, in which the discriminant accuracy for samples with a high default loss rate is 86.00%, and the discriminant accuracy for samples with a low default loss rate is 57.14%. We confirm that the nonparametric Bayesian method can improve the accuracy of all samples and is better at judging the default loss rate of small enterprises. Among the three models, the overall accuracy of the parameter method is the lowest, since the parameter method assumes that the index data obey the normal distribution, but most of the data do not obey the normal distribution; in reality, this leads to the low discrimination accuracy of the constructed credit evaluation index system, which may cause higher misjudgment losses. Therefore, the index system constructed by using nonparametric method can not only enable Banks and other financial institutions to correctly assess the credit status of small enterprises, solve the problem of financing difficulties for small enterprises but also reduce the potential losses of banks and other financial institutions.

5. Conclusions

Using a sample of small businesses, we construct two credit evaluation indicators and determine which methods are more effective in evaluating their creditworthiness. In the first stage, using Bayesian discrimination, enterprises can be divided between those in default and nondefault; then, using the clustering method, default customers are divided between those with a high default loss rate and a low default loss rate so as to build a stronger sensitivity index. Finally, we construct an index system composed of 24 indicators using the nonparametric Bayesian discriminant model and an index system composed of 14 indicators using the parametric Bayesian discriminant model. We confirm the effectiveness of both models with an ROC curve, showing that a more sensitive indicator system can be built.

A comparative analysis of the discriminant accuracy of the three models shows that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor. So the index built using the nonparametric Bayesian discrimination model is the best and has strong default discrimination ability, which can be applied in practice. Due to the limitation of research ability, this study still has some limitations. The credit evaluation index system constructed in this paper is based on isolated time points, which not only ignores the potential change trend of samples but also leads to the inaccurate index system due to the abrupt change of some sample data. It is one of the future research directions to construct the credit evaluation index system by comprehensively considering the credit status of each period.

Data Availability

The empirical sample in this paper consists of data on loans for 3,111 small enterprises from the database of a Chinese commercial bank.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was supported by the Key Projects of the National Natural Science Foundation of China (71731003), Natural Science Foundation of Inner Mongolia Autonomous Region of China (2020MS07009), Research Project of Science and Technology of Inner Mongolia Autonomous Region (201605053), and the Post-Doctoral Funding Project of Inner Mongolia Agricultural University of China.

Supplementary Materials

The code used in this article and the results of three simple cross-validations are included. (Supplementary Materials)