Abstract

A least squares fuzzy support vector machine (LS-FSVM) model that integrates advantages of fuzzy support vector machine (FSVM) and least squares method is proposed for credit risk evaluation. In the proposed LS-FSVM model, the purpose of incorporating the concepts of fuzzy sets is to add generalization capability and outlier insensitivity, while the least squares method is adopted to reduce the computational complexity. For illustrative purposes, a real-world credit risk dataset is used to test the effectiveness and robustness of the proposed LS-FSVM methodology.

1. Introduction

Credit risk evaluation has been a major area of focus for financial and banking industries due to recent financial crises as well as the Basel III regulations. Since the seminal work of Altman [1] was published, many different techniques such as discriminant analysis [1], logit analysis [2], probit analysis [3], linear programming [4], integer programming [5], -nearest neighbor (KNN) classifiers [6], and classification trees [7] have widely been applied to credit risk assessment tasks. With the advance of modern computing technology, some artificial intelligence (AI) tools such as artificial neural networks (ANN) [8, 9], genetic algorithms (GAs) [10, 11], self-organizing learning [12], support vector machines (SVMs) [1318], and some variants of SVMs [1922] have also been employed for credit risk evaluation. Empirical results have revealed that these AI techniques offer advantages over traditional statistical models and optimization techniques in credit risk evaluation.

According to the literature review, it is easy to find that almost all classification methods can be used for credit risk assessment. However, some hybrid and combined (or ensemble) classifiers, which integrate two or more single classification methods, have shown higher predictability than individual methods. Research on combined or ensemble classifiers is currently flourishing in credit risk evaluation. Recent examples are neural discriminant technique [23], neurofuzzy [16, 24], fuzzy SVM [25], rough set based SVM [26], evolving neural network [27], neural network ensemble [28, 29], support vector machine based multiagent ensemble learning [30], and AI-based fuzzy group decision making (GDM) model [31]. Two recent surveys [32, 33] and one monograph [34] cover credit risk analysis in more detail.

In this study, a new credit classification technique, least squares fuzzy SVM (LS-FSVM), is proposed to discriminate good from bad customers in customer credit evaluation. In the existing studies, the fuzzy SVM (FSVM) proposed by Lin and Wang [35] has been shown to be suitable for customer credit assessment [34]. The main reason is that in credit risk evaluation we usually cannot label a customer as absolutely good one who is sure to repay in time or absolutely bad one who will default certainly, while the FSVM treats every sample as belonging to good and bad classes to some extent. This enables the FSVM to offer a higher generalization capability without losing the merits of insensitivity to outliers. Although the FSVM has good generalization capabilities and outlier insensitivity, computational complexity of the FSVM makes its use rather difficult because the final solution of FSVM is derived from a quadratic programming (QP) problem [35]. To reduce the computational complexity of FSVM, this study applies the least squares method to reduce the computational complexity of FSVM and to formulate a new classification method—least squares FSVM (LS-FSVM). In the proposed LS-FSVM model, equality constraints instead of inequality constraints are used. As a result, the solution is obtained from a set of linear equations, instead of QP presenting in the classical FSVM approach [35], thereby reducing the computational complexity, relative to the FSVM.

From the above descriptions, the main advantage of the proposed LS-FSVM can be summarized with regard to the following two aspects. On the one hand, fuzzification processing can increase the generalization capability and improve the suitability of SVM due to the fact that the same uncertainties can be well treated by fuzzy membership. On the other hand, the least squares method can reduce the computational complexity of FSVM because the solution of least squares FSVM can be obtained from a set of linear equations instead of a QP problem, thus increasing the computational speed of FSVM which is attractive in solving fuzzy information engineering problems. In the existing literature, Tsujinishi and Abe [36] proposed a fuzzy LS-SVMs method to solve the multiclass problems. Regrettably, the performance of the proposed fuzzy LS-SVMs model is inferior to the fuzzy SVMs. On the contrary, the proposed LS-FSVM model in this paper reported the good performance in the two-class problems due to the above features of LS-FSVM model.

The main motivation of this study is to formulate the least squares version of FSVM for binary classification problems and to compare its performance with some typical classification techniques in the area of credit risk evaluation. The rest of this study is organized as follows. Section 2 illustrates the formulation of the LS-FSVM methodology. In Section 3, a real-world credit dataset is used to test the performance of the LS-FSVM to classify different samples. Section 4 concludes the paper.

2. Methodology Formulation

In this section, a brief introduction of SVM classifiers [37] is first presented. Then a fuzzy SVM (FSVM) model [35] is briefly reviewed. Finally, the least squares FSVM (LS-FSVM) model is formulated.

2.1. SVM for Binary Classification (By Vapnik [37])

Given a training dataset , where is the th input pattern and is its corresponding observed result, which is a binary variable. In credit risk evaluation models, s denote the attributes of customers and is the observed outcome of repayment obligations. If the customer defaults, , or else . The generic idea of SVM is first to map the input data into a high-dimensional feature space through a mapping function and then to find the optimal separating hyperplane with minimal classification errors. The separating hyperplane can be represented as where is the normal vector of the hyperplane and is the bias.

Suppose is a nonlinear function that maps the input space into a higher dimensional feature space. If a dataset is linearly separable in this feature space, the classifier is constructed as which is equivalent to

In order to deal with dataset that is not linearly separable, the previous analysis can be generalized by introducing some nonnegative variables , such that (3) can be modified as follows:

The nonnegative in (4) are those for which data point does not satisfy (3). Thus, the term can be considered as a measure of the amount of misclassification, that is, tolerable misclassification errors.

According to the structural risk minimization principle [37], the risk bound is minimized by solving the following optimization problem: where is a free regularization parameter controlling a tradeoff between margin maximization and tolerable misclassification errors.

Searching the optimal hyperplane in (5) is a quadratic programming (QP) problem. After introducing a set of Lagrangian multipliers and for constraints in (5), the primal problem in (5) is to find out the saddle point of the Lagrangian function; that is,

Differentiating (6) with , , and , one obtains

From (7) one has . The key issue is how to determine the value of . To obtain a solution, the dual problem of the primal problem (6) becomes

Function in (8) is then related to by imposing which is motivated by the Mercer theorem [37]. is the kernel function in the input space that determines the inner products of the two data points in the feature space. According to the Karush-Kuhn-Tucker (KKT) theorem [38], the KKT conditions are defined as

From this equality, it is deduced that the only nonzero values, , in (10), are those for which constraints in (4) are satisfied with the equality sign. Data points , corresponding to , are called support vectors (SVs). But there are two types of SVs in a nonseparable case. In the case of , the corresponding support vector satisfies equalities and . In the case of , the corresponding is not zero and the corresponding support vector does not satisfy (2), which considered such SVs as misclassification errors. In terms of the above processing, data points , corresponding to , are classified correctly.

Using the support vectors, the optimal solution for the weight vector in (7) can be given by where is the number of SVs. Moreover, in the case of , condition applies to (11) in terms of the KKT theorem [38]. Thus, one may determine the optimal bias by taking any data point in the dataset. However, from the numerical perspective, it is better to take the mean value of , from such data points in the dataset. Once the optimal parameter air () is determined, the decision function of the SVM classifiers can be represented as

2.2. FSVM (by Lin and Wang [35])

SVM has been proved to be a powerful tool for solving classification problems [37], but it has some inherent limitations. From its formulation discussed above, each training point in the training dataset belongs to either one or the other class. But in many real-world applications, every training sample does not exactly belong to one of the two classes; it may belong to one class to the extent of 80 percent and 20 percent of it may be meaningless. That is to say, there is a membership grade associated with each training data point . In this sense, FSVM is an extension of a SVM that takes into consideration the varying significance of the training samples. For FSVM, each training sample is associated with a membership value . The membership value reflects the confidence degree of the data points. The higher the value, the higher the confidence degree of its class label. Similar to SVM, the optimization problem of the FSVM [35] is formulated as follows:

Similar to SVM, the solution of FSVM is obtained from the above quadratic programming (QP) problem. Note that error term is scaled by membership value . The membership values used to weigh the soft penalty term reflect the relative confidence degree of the training samples during training. Important samples with larger membership values will have more impact in FSVM training than those with smaller values.

Similar to Vapnik’s SVM [37], optimization problem of the FSVM can be transformed into the following dual problem:

In the same way, KKT conditions are defined as

Data point , corresponding to , is called a support vector. There are two types of SVs. The one corresponding to lies on the margin of the hyperplane, and the other corresponding to is treated as misclassified.

Solving (15) leads to a decision function similar to (13), but with different support vectors and the corresponding weights . An important difference between SVM and FSVM is that data points with the same value of may be indicated as a different type of SVs in FSVM due to membership factor . Interested readers can refer to [35] for more details.

2.3. Least Squares FSVM

In both SVM and FSVM, the final solution can be described as an issue closely associated with quadratic programming (QP) problem. The main issue with the QP method is that it is a time-consuming process to find the solutions when handling some large-scale real-world application problems. Motivated by Lai et al. [14] and Suykens and Vandewalle [39], the least squares FSVM (LS-FSVM) model is introduced by formulating the following optimization problem:

One can define the Lagrangian function as where is the th Lagrangian multiplier, which can be either positive or negative, due to equality constraints in accordance with KKT conditions [38].

The optimal conditions are obtained by differentiating (18):

From (19), one can obtain the following equations:

Using a matrix form, optimal conditions in (20) can be expressed by where , Y, and 1 are (22), (23), and (24), respectively:

From (22), is positive definite. Thus, can be obtained from (21); that is,

Substituting (25) into the second matrix equation in (21), we can obtain

Here, since is positive definite, is also positive definite. In addition, since is a nonzero vector, . Thus, is always obtained. Substituting (26) into (25), can be obtained.

Hence, the separating hyperplane of LS-FSVM can be found by solving the linear set of (21)–(24), instead of quadratic programming (QP), thereby reducing the computational complexity, especially for large-scale problems.

The main advantages of the LS-FSVM can be summarized into the following five aspects. First of all, the LS-FSVM requires fewer prior assumptions about the input data than those required in statistical approaches, such as normal distribution and continuity. Second, it can perform nonlinear mapping from an original input space into a high dimensional feature space, in which it constructs a linear discriminant function to replace the nonlinear function in the original input space. This characteristic also solves the dimension disaster problem because its computational complexity is not dependent on the samples’ dimension. Third, it attempts to learn the separating hyperplane to maximize the classification margin, thereby implementing structural risk minimization and realizing good generalization capability. Fourth, a distinct trait of LS-FSVM is that it can further lower computational complexity by transforming a quadratic programming problem into a linear equation set problem. Finally, an important advantage of LS-FSVM is that support values in LS-FSVM are proportional to the membership degree, as well as misclassification errors at the data points, thus making LS-FSVM more suitable for some real-world problems, which is the main difference between the proposed LS-FSVM and the traditional SVM and LS-SVM models. These important characteristics also make LS-FSVM preferable in many practical applications. In the following section, some experiments are presented for verification purpose.

3. Experiment Analysis

In this section, a real-world credit dataset is used to test the performance of LS-FSVM. For comparison purposes, linear regression (LinR) [14], logistic regression (LogR) [2], artificial neural network (ANN) [8, 9], Vapnik’s SVM [37], Lin and Wang’s FSVM [35], and LS-SVM [39] are also used.

The dataset in this study comes from a financial services company of England, obtained from a CDROM published by Thomas et al. [40]. Each applicant has 14 characteristics or variables, listed in Table 1. The dataset includes detailed information of 1225 applicants, of which 323 are observed as bad customers.

In this experiment, LS-FSVM, FSVM, LS-SVM, and SVM models use RBF kernel for classification. In the ANN model, a three-layer back-propagation neural network with 10 sigmoidal neurons is used in the hidden layer, and one linear neuron is used in the output layer. The network training algorithm is the Levenberg-Marquardt (LM) algorithm. Besides, the learning and momentum rates are set to 0.1 and 0.15, respectively. The accepted average squared error is 0.05, and the training epochs are 1600. The above parameters are obtained by trial-and-error method. The experiment is run by MATLAB 6.1 with statistical toolbox, NNET toolbox, and LS-SVM toolbox. In addition, three evaluation criteria are used to measure the efficiency of classification:

To show the classification capability of LS-FSVM in distinguishing potentially insolvent customers from good customers, we perform the test with LS-FSVM at the beginning. This testing process includes five steps.

First, the number of observed bad customers is tripled to make their number nearly equal to the number of observed good customers. The main purpose of such processing is to avoid the impact on performance of imbalanced samples. A similar processing method can be found in Wang et al. [25]. Of course, some other imbalanced data processing methods, for example, the information granulation based method [41], can also be used.

Second, the original data is preprocessed to impute the missing data and transform category data. In this study, interpolation method is used to impute the missing data. For category data, a numerical method is used for transformation.

Third, the original dataset is randomly separated into two parts, that is, training samples and testing samples. In this study, 1500 samples are used for training and the remaining 371 samples are used for holdout testing and performance evaluation.

Fourth, membership grades are generated by the linear transformation function proposed by Wang et al. [25], in terms of the initial score obtained by experts’ experience and opinions. Of course, some new membership generation methods in [42] can also be adopted.

Finally, the LS-FSVM classifier is trained and accordingly the results can be evaluated. The above five steps are repeated 20 times to confirm the robustness of the proposed method. In this study, the efficiency and robustness of credit risk evaluation using the LS-FSVM model are shown in Table 2.

As can be seen from Table 2, the proposed LS-FSVM model exhibits significant classification capabilities. In the 20 experiments, type I accuracy, type II accuracy, and total accuracy are 81.34%, 93.41%, and 89.21%, respectively, in the mean sense. Furthermore, the standard deviation is rather small, revealing that robustness of the LS-FSVM classifier is good. These results imply that the LS-FSVM model is a promising credit risk evaluation technique.

For further illustration, the classification power of the LS-FSVM is also compared with six other commonly used classifiers: linear regression (LinR) [14], logistics regression (LogR) [2], artificial neural network (ANN) [8, 9], Vapnik’s SVM [37], FSVM [35], and LS-SVM [39]. The results of the comparison are reported in Table 3.

From Table 3, several important results can be observed.(a)For type I accuracy, LS-FSVM is the best of all the listed approaches, followed by FSVM, LS-SVM, Vapnik’s SVM, logistics regression, artificial neural network model, and linear regression model, implying that the LS-FSVM is a very promising technique in credit risk assessment. Particularly, performance of the two fuzzy SVM techniques (Lin and Wang’s FSVM [35] and LS-FSVM) is better than other classifiers listed in this study, implying that the fuzzy SVM classifier may be more suitable for credit risk assessment tasks than other deterministic classifiers, such as linear regression (LinR) and logit regression (LogR).(b)From the viewpoint of type II accuracy, LS-FSVM and LS-SVM outperform the other five models, implying the strong capability of the least squares version of the SVM model in credit risk evaluation. In the meantime, the proposed LS-FSVM model seems to be slightly better than LS-SVM, revealing that the LS-FSVM is a feasible solution to improve the accuracy of credit risk evaluation. Interestingly, performance of the FSVM is slightly worse than that of the LS-SVM. The reasons for this phenomenon are worth exploring further.(c)According to the total accuracy, performances of the two statistical models (LinR and LogR) are much worse than those of the other five models. The main reason is that the five models can effectively capture the nonlinear patterns hidden in the credit data. As is known, there are many factors that affect customer credit. Usually, the relationships between customer default and these factors are very subtle and complex. Besides some linear relationships, some nonlinear relationships often exist in the credit data. Therefore, nonlinear intelligent models can offer advantage over traditional linear models.(d)From the perspective of computational time, two traditional classification models (i.e., LinR and LogR) are faster than all the intelligent models due to their simplicity. In all the intelligent models, LS-SVM and LS-FSVM are faster than other intelligent models due to the least squares principle. However, LS-FSVM is slightly slower than LS-SVM because the fuzzification needs some processing time, but LS-FSVM performs faster than the SVM and FSVM, indicating that the proposed LS-FSVM model is a very effective model in credit risk evaluation.(e)In the five intelligent models, the performance of ANN and SVM models is much worse than the other three intelligent models. The main reason is that the standard ANN and SVM models have their own shortcomings, such as the sensitivity of parameters and outliers, thus affecting their classification performance. For example, ANN models often get trapped into local minima and suffer from overfitting problems, while SVM models occasionally encounter overfitting problem [43]; moreover, some fuzzy information is not handled well by the standard SVM models.(f)LS-SVM, FSVM, and LS-FSVM models outperform the other four models listed in this study, implying that the variants of SVM have a strong classification potentiality for credit risk evaluation. The possible reason is that the improvement of SVM variants overcomes some inherent limitations of standard SVM proposed by Van Gestel [44], thereby increasing the generalization capability. From a general point of view, LS-FSVM outperforms the other six classifiers from the above three measurements, revealing that the LS-FSVM is used as an effective tool for credit risk evaluation.

In terms of Table 3 and the three measurements, it is easy to judge which model is the best and which model is the worst. However, it is unclear what the differences between good models and bad ones are. For this purpose, McNemar’s test [45] is conducted to examine whether the proposed LS-FSVM classifier significantly outperforms the other six classifiers listed in this study.

As a nonparametric test for the two related samples, McNemar’s test is particularly useful for before-after measurement of the same subjects [46]. Taking the total accuracy results from Table 3, Table 4 shows the results of the McNemar’s test for the credit dataset to statistically compare the performance in respect of testing data among the seven classifiers. It should be noted that the results listed in Table 4 are the Chi-squared values, and values are in brackets.

According to the results reported in Table 4, some important conclusions can be drawn in terms of McNemar’s statistical test.(1)The proposed LS-FSVM classifier outperforms the standard SVM, ANN, logit regression (LogR), and linear regression (LinR) models at 1% statistical significance level. However, the proposed LS-FSVM model does not significantly outperform the LS-SVM model and the FSVM model. These results are consistent with those of Table 3.(2)Similar to the LS-FSVM model, LS-SVM and FSVM models outperform the other four individual models (i.e., individual SVM, ANN, LogR, and LinR models) at 1% significance level. But the McNemar’s test does not conclude that the LS-SVM model performs better than the FSVM model.(3)For SVM and ANN models, we can find that these two models perform much better than the two statistical models (i.e., LogR and LinR models) at 1% significance level. Interestingly, the SVM model does not outperform the ANN model at 10% significance level, although many applications have reported that the performance of the SVM was much better than that of ANN. The possible reason lies in data samples used in this study.(4)Comparing with LogR and LinR models, it is easy to find that the LogR model performs better than the LinR model at 5% significance level. All findings are consistent with results reported in Table 3.

In summary, according to the above experimental results and statistical testing, it is easy to conclude that the LS-FSVM model can significantly outperform some standard intelligent models (e.g., SVM and ANN) and some statistical models (e.g., LogR and LinR), revealing that the LS-FSVM can also be used as a competitive solution to credit risk evaluation.

4. Conclusions

In this paper, a powerful classification method—least squares fuzzy support vector machines (LS-FSVM)—is proposed to evaluate credit risks. Through the least squares method, a quadratic programming (QP) problem of SVM can be transformed into a set of linear equations successfully, thereby reducing the computational complexity. Furthermore, the fuzzification processing in the proposed LS-FSVM model adds generalization capability and insensitivity to outliers. Experiments with real-world dataset have produced good classification results and fast computational efficiency and have demonstrated that the proposed LS-FSVM model can provide a feasible alternative to credit risk assessment. Besides the credit risk evaluation problem, the proposed LS-FSVM model can also be extended to other applications, such as consumer credit rating and corporate failure prediction problems, which will be investigated in the future research.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author would like to express his sincere appreciation to two independent referees in making valuable comments and suggestions to the paper. His comments have improved the quality of the paper immensely. This work is partially supported by Grants from the National Science Fund for Distinguished Young Scholars (NSFC no. 71025005), the National Natural Science Foundation of China (NSFC nos. 90924024 and 91224001), Hangzhou Normal University, and the Fundamental Research Funds for the Central Universities in BUCT.