Abstract

Corporate credit-rating prediction using statistical and artificial intelligence techniques has received considerable attentions in the literature. Different from the thoughts of various techniques for adopting support vector machines as binary classifiers originally, a new method, based on support vector domain combined with fuzzy clustering algorithm for multiclassification, is proposed in the paper to accomplish corporate credit rating. By data preprocessing using fuzzy clustering algorithm, only the boundary data points are selected as training samples to accomplish support vector domain specification to reduce computational cost and also achieve better performance. To validate the proposed methodology, real-world cases are used for experiments, with results compared with conventional multiclassification support vector machine approaches and other artificial intelligence techniques. The results show that the proposed model improves the performance of corporate credit-rating with less computational consumption.

1. Introduction

Techniques of credit ratings have been applied by bond investors, debt issuers, and governmental officials as one of the most efficient measures of risk management. However, company credit ratings are too costly to obtain, because agencies including Standard and Poor’s (S&P), and Moody’s are required to invest lots of time and human resources to accomplish critical analysis based on various aspects ranging from strategic competitiveness to operational level in detail [13]. Moreover, from a technical perspective, credit rating constitutes a typical multiclassification problem, because the agencies generally have much more than two categories of ratings. For example, ratings from S&P range from AAA for the highest-quality bonds to D for the lowest-quality ones.

The final objective of credit rating prediction is to develop the models, by which knowledge of credit risk evaluation can be extracted from experiences of experts and to be applied in much broader scope. Besides prediction, the studies can also help users capture fundamental characteristics of different financial markets by analyzing the information applied by experts.

Although rating agencies take emphasis on experts’ subjective judgment in obtaining ratings, many promising results on credit rating prediction based on different statistical and Artificial Intelligence (AI) methods have been proposed, with a grand assumption that financial variables extracted from general statements, such as financial ratios, contain lots of information about company’s credit risk, embedded in their valuable experiences [4, 5].

Among the technologies based on AI applied in credit rating prediction, the Artificial Neural Networks (ANNs) have been applied in the domain of finance because of the ability to learn from training samples. Moreover, in terms of defects of ANN such as overfitting, Support Vector Machine (SVM) has been regarded as one of the popular alternative solutions to the problems, because of its much better performance than traditional approaches such as ANN [611]. That is, an SVM’s solution can be globally optimal because the models seek to minimize the structural risk [12]. Conversely, the solutions found by ANN tend to fall into local optimum because of seeking to minimize the empirical risk.

However, SVM, which was originally developed for binary classification, is not naturally modified for multiclassification of many problems including credit ratings. Thus, researchers have tried to extend original SVM to multiclassification problems [13], with some techniques of multiclassification SVM (MSVM) proposed, which include approaches that construct and combine several binary classifiers as well as the ones that directly consider all the data in a single optimization formulation.

In terms of multiclassification in the domain of credit rating containing lots of data, current approaches applied in MSVM still have some drawbacks in integration of multiple binary classifiers as follows.(1)Some unclassifiable regions may exist if a data point belongs to more than one class or to none.(2)Training binary classifiers based on two-class SVM multiple times for the same data set often result in a highly intensive time complexity for large-scale problems including credit ratings prediction to improve computational consumption.

To overcome the drawbacks associated with current MSVM in credit rating prediction, a novel model based on support vector domain combined with kernel-based fuzzy clustering is proposed in the paper to accomplish multiclassification involved in credit ratings prediction.

2. Literature Review

2.1. Credit Rating Using Data Mining Techniques

Major researches applying data mining techniques for bond rating prediction can be found in the literature.

Early investigations of credit rating techniques mainly focused on the applicability of statistical techniques including multiple discriminant analysis (MDA) [14, 15] and logistic regression analysis (LRA) [16], and so forth, while typical techniques of AI including ANN [17, 18] and case-based reasoning (CBR) [19], and so forth are applied in the second phase of research.

The important researches applying AI techniques in bond-rating prediction are listed in Table 1. In summary, the most prior ones accomplish prediction using ANN with comparison to other statistical methods, with general conclusions that neural networks outperformed conventional statistical methods in the domain of bond rating prediction.

On the other hand, to overcome the limitations such as overfitting of ANN, techniques based on MSVM are applied in credit rating in recent years. Among the models based on MSVM in credit rating, method of Grammar and Singer was early proposed by Huang et al., with experiments based on different parameters so as to find the optimal model [29]. Moreover, methodologies based on One-Against-All, One-Against-One, and DAGSVM are also proposed to accomplish S&P’s bond ratings prediction, with kernel function of Gaussian RBF applied and the optimal parameters derived form a grid-search strategy [28]. Another automatic-classification model for credit rating prediction based on One-Against-One approach was also applied [30]. And Lee applied MSVM in corporate credit rating prediction [31], with experiments showing that model based on MSVM outperformed other AI techniques such as ANN, MDA, and CBR.

2.2. Multiclassification by Support Vector Domain Description

Support Vector Domain Description (SVDD), proposed by Tax and Duin in 1999 [32] and extended in 2004 [33], is a method for classification with the aim to accomplish accurate estimation of a set of data points originally. The methods based on SVDD differ from two or multiclass classification in that a single object type is interested rather than to be separated from other classes. The SVDD is a nonparametric method in the sense that it does not assume any particular form of distribution of the data points. The support of unknown distribution of data points is modeled by a boundary function. And the boundary is “soft” in the sense that atypical points are allowed outside it.

The boundary function of SVDD is modeled by a hypersphere rather than a hyperplane applied in standard SVM, which can be made with less constrains by mapping the data points to a high-dimensional space using methodology known as kernel trick, where the classification is performed.

SVDD has been applied in a wide range as a basis for new methodologies in statistical and machine learning, whose application in anomaly detection showed that the model based on it can improve accuracy and reduce computational complexity [34]. Moreover, ideas of improving the original SVDD through weighting each data point by an estimate of its corresponding density were also proposed [35] and applied in area of breast cancer, leukemia, and hepatitis, and so forth. Other applications including pump failure detection [36], face recognition [37], speaker recognition [38], and image retrieval [39] are argued by researchers.

The capability of SVDD in modeling makes it one of the alternative to large-margin classifiers such as SVM. And some novel methods applied in multiclass classification were proposed based on SVDD [40] combined with other algorithms such as fuzzy theories [41, 42] and Bayesian decision [36].

3. The Proposed Methodology

In terms of SVDD, which is a boundary-based method for data description, it needs more boundary samples to construct a closely fit boundary. Unfortunately, more boundary ones usually imply that more target objects have to be rejected with the overfitting problem arising and computational consumption increased. To accomplish multiclassification in corporate credit rating, a method using Fuzzy SVDD combined with fuzzy clustering algorithm is proposed in the paper. By mapping data points to a high-dimensional space by Kernel Trick, the hypersphere applied to every category is specified by training samples selected as boundary ones, which are more likely to be candidates of support vectors. After preprocessing using fuzzy clustering algorithm, rather than by original ones directly in standard SVDD [32, 33], one can improve accuracy and reduce computational consumption. Thus, testing samples are classified by the classification rules based on hyperspheres specified for every class. And the thoughts and framework of the proposed methodology can be illustrated in Figures 1 and 2, respectively.

3.1. Fuzzy SVDD
3.1.1. Introduction to Hypersphere Specification Algorithm

The hypersphere, by which SVDD models data points, is specified by its center 𝐚 and radius 𝑅. Let 𝐗=(𝐱1,𝐱2,𝐱3,) denote the data matrix with 𝑛 data points and 𝑝 variables, which implies that 𝐚 is p-dimensional while 𝑅 is scalar. The geometry of one solution to SVDD in two dimensions is illustrated in Figure 3, where 𝜔𝑖 represents the perpendicular distance from the boundary to an exterior points 𝐱𝑖. In terms of interior points, and the ones positioned on the boundary, 𝜔𝑖 is to be assigned as 0. Hence, 𝜔𝑖 can be calculated using the following equation: 𝜔𝑖𝐱=max0,𝑖𝐚𝑅.(3.1)

In the following, another closely related measure can be obtianed in (3.2) in terms of exterior points 𝜉𝑖=𝐱𝑖𝐚2𝑅2𝐱𝑖𝐚2=𝑅2+𝜉𝑖.(3.2)

To obtain an exact and compact representation of the data points, the minimization of both the hypersphere radius and 𝜉𝑖 to any exterior point is required. Moreover, inspired by fuzzy set theory, matrix 𝐗 can be extended to 𝐗=((𝐱1,𝑠1),(𝐱2,𝑠2),(𝐱3,𝑠3),) with coefficients 𝑠𝑖 representing fuzzy membership associated with 𝐱𝑖 introduced. So, the data domain description can be formulated as (3.3), where nonnegative slack variables 𝜉𝑖 are a measure of error in SVDD, and the term 𝑠𝑖𝜉𝑖 is the one with different weights based on fuzzy set theory min𝑎,𝑅,𝜁𝑅2+𝐶𝑙𝑖=1𝑠𝑖𝜉𝑖,𝐱s.t𝑖𝐚2𝑅2+𝜉𝑖𝜉𝑖0,𝑖=1,,𝑙.(3.3)

To solve the problem, the Lagrange Function is introduced, where 𝛼𝑖,𝛽𝑖0 are Lagrange Multipliers shown as follows: 𝐿(𝑅,𝑎,𝜉,𝛼,𝛽)=𝑅2+𝐶𝑙𝑖=1𝑠𝑖𝜉𝑖𝑙𝑖=1𝛼𝑖𝑅2+𝜉𝑖𝐱𝑖𝐚2𝑙𝑖=1𝛽𝑖𝜉𝑖.(3.4)

Setting (3.4) to 0, the partial derivates of 𝐋 leads to the following equations: 𝜕𝐿𝜕𝑅=2𝑅2𝑅𝑙𝑖=1𝛼𝑖=0,𝜕𝐿=𝜕𝐚𝑙𝑖=1𝛼𝑖𝐱𝑖𝐚=0,𝜕𝐿𝜕𝜉𝑖=𝑠𝑖𝐶𝛼𝑖𝛽𝑖=0.(3.5) That is, 𝑙𝑖=1𝛼𝑖=1,𝐚=𝑙𝑖=1𝛼𝑖𝐱𝑖,𝛽𝑖=𝑠𝑖𝐶𝛼𝑖.(3.6)

The Karush-Kuhn-Tucker complementarities conditions result in the following equations: 𝛼𝑖𝑅2+𝜉𝑖𝐱𝑖𝐚2𝛽=0,𝑖𝜉𝑖=0.(3.7)

Therefore, the dual form of the objective function can be obtained as follows: 𝐿𝐷(𝛼,𝛽)=𝑙𝑖=1𝛼𝑖𝐱𝑖𝐱𝑖𝑙𝑙𝑖=1𝑖=1𝛼𝑖𝛼𝑗𝐱𝑖𝐱𝑗.(3.8)

And the problem can be formulated as follows: max𝑙𝑖=1𝛼𝑖𝐱𝑖𝐱𝑖𝑙𝑙𝑖=1𝑖=1𝛼𝑖𝛼𝑗𝐱𝑖𝐱𝑗s.t0𝛼𝑖𝑠𝑖𝐶,𝑖=1,2,,𝑙,𝑙𝑖=1𝛼𝑖=1.(3.9)

The center of the hypersphere is a linear combination of data points with weighting factors 𝛼𝑖 obtained by optimizing (3.9). And the coefficients 𝛼𝑖, which are nonzero, are thus selected as support vectors, only by which the hypersphere is specified and described. Hence, to judge whether a data point is within a hypersphere, the distance to the center should be calculated with (3.10) in order to judge whether it is smaller than the radius 𝐑. And the decision function shown as (3.12) can be concluded from 𝐱𝑙𝑖=1𝛼𝑖𝐱𝑖2𝑅2,𝑅(3.10)2=𝐱𝑖0𝑙𝑖=1𝛼𝑖𝐱𝑖=𝐱𝑖0𝐱𝑖02𝑙𝑖=1𝛼𝑖𝐱𝑖0𝐱𝑖+𝑙𝑙𝑖=1𝑖=1𝛼𝑖𝛼𝑗𝐱𝑖𝐱𝑗,(3.11)𝐱𝐱2𝑙𝑖=1𝛼𝑖𝐱𝐱𝑖𝐱𝑖0𝐱𝑖02𝑙𝑖=1𝛼𝑖𝐱𝑖0𝐱𝑖.(3.12)

3.1.2. Introduction to Fuzzy SVDD Based on Kernel Trick

Similarly to the methodology based on kernel function proposed by Vapnik [12], the Fuzzy SVDD can also be generalized to high-dimensional space by replacing its inner products by kernel functions 𝐾(,)=Φ()Φ().

For example, Kernel function of RBF can be introduced to SVDD algorithm, just as shown as follows: max1𝑙𝑖=1𝛼𝑖2𝑙𝑙𝑖=1𝑖=1𝛼𝑖𝛼𝑗𝐾𝐱𝑖𝐱𝑗s.t0𝛼𝑖𝑠𝑖𝐶,𝑖=1,2,,𝑙,𝑙𝑖=1𝛼𝑖=1.(3.13)

And it can be determined whether a testing data point 𝐱 is within the hypersphere with (3.14) by introducing kernel function based on (3.12) 𝑙𝑖=1𝛼𝑖𝐾𝐱,𝐱𝑖𝑙𝑖=1𝛼𝑖𝐾𝐱𝑖0,𝐱𝑖.(3.14)

3.2. Kernel-Based Fuzzy Clustering Algorithm
3.2.1. Introduction to Fuzzy Attribute C-Means Clustering

Based on fuzzy clustering algorithm [42], Fuzzy Attribute C-means Clustering (FAMC) [43] was proposed as extension of Attribute Means Clustering (AMC) and Fuzzy C-means (FCM).

Suppose 𝜒𝑅𝑑 denote any finite sample set, where 𝜒={𝐱1,𝐱2,,𝐱𝑛}, and each sample is defined as 𝐱𝑛=(𝑥1𝑛,𝑥2𝑛,,𝑥𝑑𝑛)(1𝑛𝑁). The category of attribute space is 𝐹={𝐶1,𝐶2,,𝐶𝑐}, where 𝑐 is the cluster number. For 𝐱𝜒, let 𝜇𝐱(𝐶𝑘) denote the attribute measure of 𝐱, with 𝑐𝑘=1𝜇𝐱(𝐶𝑘)=1.

Let 𝐩𝑘=(𝐩𝑘1,𝐩𝑘2,,𝐩𝑘𝑑) denote the kth prototype of cluster 𝐶𝑘, where 1𝑘𝑐.

Let 𝜇𝑘𝑛 denote the attribute measure of the 𝑛th sample belonging to the kth cluster. That is, 𝝁𝑘𝑛=𝜇𝑛(𝐩𝑘), 𝐔=(𝝁𝑘𝑛), 𝐩=(𝐩1,𝐩2,,𝐩𝑘). The task of fuzzy clustering is to calculate the attribute measure 𝜇𝑘𝑛, and determine the cluster which 𝐱𝑛 belongs to according to the maximum cluster index.

Fuzzy C-means (FCM) is an inner-product-induced distance based on the least-squared error criterion. A brief review of FCM can be found in Appendix based on coefficients definitions mentioned above.

Attribute Means Clustering (AMC) is an iterative algorithm by introducing the stable function [44]. Suppose 𝜌(𝑡) is a positive differential function in [0,). Let 𝜔(𝑡)=𝜌(𝑡)/2𝑡, if 𝜔(𝑡), called as weight function, is a positive nonincreasing function, 𝜌(𝑡) is called as stable function. And 𝜌(𝑡) can be adopted as follows: 𝜌(𝑡)=𝑡02𝑠𝜔(𝑠)𝑑𝑠.(3.15)

Hence, the relationship of objective function 𝜌(𝑡) and its weight function is described by sable function, which was introduced to propose AMC.

According to current researches, some alternative functions including squared stable function, Cauchy stable function, and Exponential stable function are recommended.

Based on previous researches, AMC and FCM are extended to FAMC, which is also an iterative algorithm to minimize the following objective function shown as (3.16), where 𝑚>1, which is a coefficient of FCM introduced in Appendix 𝑃(𝐔,𝐩)=𝑐𝑁𝑘=1𝑛=1𝜌𝜇𝑚/2𝑘𝑛𝐱𝑛𝐩𝑘.(3.16)

Moreover, procedure of minimizing (3.16) can be converted to an iterative objective function shown as (3.17) [43] 𝑄(𝑖)(𝐔,𝐩)=𝑐𝑁𝑘=1𝑛=1𝜔𝜇𝑖𝑘𝑛𝑚/2𝐱𝑛𝐩𝑖𝑘𝜇𝑘𝑛𝑚𝐱𝑛𝐩𝑘2.(3.17)

And the following equations can be obtained by minimizing 𝑄(𝑖)(𝐔𝑖,𝐩), 𝑄(𝑖)(𝐔,𝐩(𝑖+1)), respectively, which can be seen in [43, 45] in detail 𝑝𝑘(𝑖+1)=𝑁𝑛=1𝜔𝜇(𝑖)𝑘𝑛𝑚/2𝐱𝑛𝐩𝑘(𝑖)𝜇(𝑖)𝑘𝑛𝑚𝐱𝑛𝑁𝑛=1𝜔𝜇(𝑖)𝑘𝑛𝑚/2𝐱𝑛𝐩𝑘(𝑖)𝜇(𝑖)𝑘𝑛𝑚,𝜇(𝑖+1)𝑘𝑛=𝜔𝜇𝑖𝑘𝑛𝑚/2𝐱𝑛𝐩𝑘(𝑖)𝐱𝑛𝐩𝑘(𝑖+1)21/(𝑚1)𝑐𝑘=1𝜔𝜇𝑖𝑘𝑛𝑚/2𝐱𝑛𝐩𝑘(𝑖)𝐱𝑛𝐩𝑘(𝑖+1)21/(𝑚1).(3.18)

3.2.2. Introduction to Kernel-Based Fuzzy Clustering

To gain a high-dimensional discriminant, FAMC can be extended to Kernel-based Fuzzy Attribute C-means Clustering (KFAMC). That is, the training samples can be first mapped into high-dimensional space by the mapping Φ using kernel function methods addressed in Section 3.1.2.

Since Φ𝐱𝑛𝐩Φ𝑘=Φ𝐱𝑛𝐩Φ𝑘𝑇Φ𝐱𝑛𝐩Φ𝑘𝐱=Φ𝑛𝑇Φ𝐱𝑛𝐱Φ𝑛𝑇Φ𝐩𝑘𝐩Φ𝑘𝑇Φ𝐱𝑛𝐩+Φ𝑘𝑇Φ𝐩𝑘𝐱=𝐾𝑛,𝐱𝑛𝐩+𝐾𝑘,𝐩𝑘𝐱2𝐾𝑛,𝐩𝑘(3.19) when Kernel function of RBF is introduced, (3.19) can be given as follows Φ𝐱𝑛𝐩Φ𝑘2𝐱=21𝐾𝑛,𝐩𝑘.(3.20)

And parameters in KFAMC can be estimated by 𝜇𝑘𝑛=𝐱1𝐾𝑛,𝐩𝑘1/(𝑚1)𝑐𝑘=1𝐱1𝐾𝑛,𝐩𝑘1/(𝑚1),𝑝𝑘=𝑁𝑛=1𝜇𝑚𝑘𝑛𝐾𝐱𝑛,𝐩𝑘𝐱𝑛𝑁𝑛=1𝜇𝑚𝑘𝑛𝐾𝐱𝑛,𝐩𝑘,(3.21) where 𝑛=1,2,,𝑁,𝑘=1,2,,𝑐.

Moreover, the objective function of KFAMC can be obtained by substituting (3.16), (3.17) with (3.22), (3.23), respectively,𝑃(𝐔,𝐩)=𝑐𝑁𝑘=1𝑛=1𝜌𝜇𝑚/2𝑘𝑛Φ𝐱𝑛𝐩Φ𝑘,𝑄(3.22)(𝑖)(𝐔,𝐩)=𝑐𝑁𝑘=1𝑛=1𝜔𝜇𝑖𝑘𝑛𝑚/2𝐱1𝐾𝑛,𝐩𝑘(𝑖)1/2𝜇𝑘𝑛𝑚𝐱1𝐾𝑛,𝐩𝑘.(3.23)

3.2.3. Algorithms of Kernel-Based Fuzzy Attribute C-Means Clustering

Based on theorem proved in [45], the updating procedure of KFAMC can be summarized in the following iterative scheme.

Step 1. Set 𝐜, 𝑚, 𝜀 and 𝑡max, and initialize 𝑈(0), 𝑊(0).

Step 2. For 𝑖=1, calculate fuzzy cluster centers 𝑃(𝑖), 𝑈(𝑖). and 𝑊(𝑖).

Step 3. If |𝑄(𝑖)(𝑈,𝑃)𝑄(𝑖+1)(𝑈,𝑃)|<𝜀 or 𝑖>𝑡max, stop, else go to Step 4.

Step 4. For step 𝑖=𝑖+1, update 𝑃(𝑖+1), 𝑈(𝑖+1), and 𝑊(𝑖), turn to Step 3,
where 𝑖 denotes iterate step, 𝑡max represents the maximum iteration times, and 𝑊(𝑖) denotes the weighting matrix, respectively, which can be seen in [45] in detail.

3.3. The Proposed Algorithm
3.3.1. Classifier Establishment

In terms of SVDD, only support vectors are necessary to specify hyperspheres. But in the original algorithms [32, 33, 41], all the training samples are analyzed and thus computational cost is high consumption. Hence, if the data points, which are more likely to be candidates of support vectors, can be selected as training samples, the hypersphere will be specified with much less computational consumption.

Just as illustrated in Figure 4, only the data points, such as M, N positioned in fuzzy areas, which are more likely to be candidates of support vectors, are necessary to be classified with SVDD, while the ones in deterministic areas can be regarded as data points belonging to certain class.

So, the new methodology applied in SVDD is proposed as follows.(1)Preprocess data points using FAMC to reduce amount of training samples. That is, if fuzzy membership of a data point to a class is great enough, the data point can be ranked to the class directly. Just as shown in Figure 5, the data points positioned in deterministic area (shadow area A) are to be regarded as samples belonging to the class, while the other ones are selected as training samples.(2) Accomplish SVDD specification with training samples positioned in fuzzy areas, which has been selected using KFAMC. That is, among the whole data points, only the ones in fuzzy area, rather than all the data points, are treated as candidates of support vectors. And the classifier applied in multiclassification can be developed based on Fuzzy SVDD by specifying hypersphere according to every class. Hence, the main thoughts of Fuzzy SVDD establishment combined with KFAMC can be illustrated in Figure 6.

The process of methods proposed in the paper can be depicted as follows.

In high-dimensional space, the training samples are selected according to their fuzzy memberships to clustering centers. Based on preprocessing with KFAMC, a set of training samples is given, which is represented by 𝐗𝑚0={(𝐱1,𝜇𝑚1),(𝐱2,𝜇𝑚2),,(𝐱𝑙,𝜇𝑚𝑙)}, where 𝑙𝑁,𝐱𝑖𝑅𝑛, and 𝜇𝑚𝑙[0,1] denote the number of training data, input pattern, and membership to class 𝑚, respectively.

Hence, the process of Fuzzy SVDD specification can be summarized as follows.

Step 1. Set a threshold 𝜃>0, and apply KFAMC to calculate the membership of each 𝐱𝑖,𝑖=1,2,,𝑙, to each class. If 𝜇𝑚𝑖𝜃, 𝜇𝑚𝑖 is to be set as 1 and 𝜇𝑡𝑖,𝑡𝑚, is to be set as 0.

Step 2. Survey the membership of each 𝐱𝑖,𝑖=1,2,,𝑙. If 𝜇𝑚𝑖=1, 𝐱𝑖 is to be ranked to class 𝑚 directly and removed from the training set. And an updated training set can be obtained.

Step 3. With hypersphere specified for each class using the updated training set obtained in Step 2, classifier for credit rating can be established using the algorithm of Fuzzy SVDD, just as illustrated in Figure 6.

3.3.2. Classification Rules for Testing Data Points

To accomplish multiclassification for testing data points using hyperspheres specified in Section 3.3.1, the following two factors should be taken into consideration, just as illustrated in Figure 7:(1) distances from the data point to centers of the hyperspheres;(2) density of the data points belonging to the class implied with values of radius of each hypersphere.

Just as shown in Figure 7, 𝐷(𝐱,𝐴), 𝐷(𝐱,𝐵) denote the distances from data point 𝐱 to center of class A and class B, respectively. Even if 𝐷(𝐱,𝐴)=𝐷(𝐱,𝐵), data point 𝐱 is expected more likely to belong to class A rather than class B because of difference in distributions of data points. That is, data points circled by hypersphere of class A are sparser than the ones circled by hypersphere of class B since 𝑅𝑎 is greater than 𝑅𝑏.

So, classification rules can be concluded as follows.

Let 𝑑 denote the numbers of hyperspheres containing the data point.

Case I (𝑑=1). Data point belongs to the class represented by the hypersphere.

Case II (𝑑=0 or 𝑑>1). Calculate the index of membership of the data point to each hypersphere using (3.24), where 𝑅𝑐 denotes the radius of hypersphere 𝑐,𝐷(𝐱𝑖,𝑐) denotes the distance from data point 𝐱𝑖 to the center of hypersphere 𝑐𝜑𝐱𝑖=𝜆𝐷𝐱,𝑐1𝑖,𝑐/𝑅𝑐𝐷𝑥1+𝑖,𝑐/𝑅𝑐𝐱+𝛾,0𝐷𝑖,𝑐𝑅𝑐,𝛾𝑅𝑐𝐷𝐱𝑖𝐱,𝑐,𝐷𝑖,𝑐>𝑅𝑐,𝜆,𝛾𝑅+,𝜆+𝛾=1.(3.24)

And the testing data points can be classified according to the following rules represented with 𝐹𝐱𝑖=argmax𝑐𝜑𝐱𝑖.,𝑐(3.25)

4. Experiments

4.1. Data Sets

For the purpose of this study, two bond-rating data sets from Korea and China market, which have been used in [46, 47], are applied, in order to validate the proposed methodology. The data are divided into the following four classes: A1, A2, A3, and A4.

4.2. Variables Selection

Methods including independent-samples 𝑡-test and F-value are applied in variable selection.

In terms of Korea data set, 14 variables, which are listed in Table 2, are selected from original ones, which were known to affect bond rating. For better comparison, similar methods were also used in China data set, with 12 variables among them being selected.

4.3. Experiment Results and Discussions

Based on the two data sets, some models based on AI are introduced for experiments. To evaluate the prediction performance, 10-fold cross validation, which has shown good performance in model selection [48], is followed. In the research, all features, which are represented with variables listed in Table 2, of data points range from 0 to 1 after Min-max transformation. To validate the methodology oriented multiclassification problem in credit rating, ten percent of the data points for each class are selected as testing samples. And the results of experiments on proposed method, with 0.9 being chosen as the value of threshold intuitively, are shown in Table 3.

To compare with other methods, the proposed model is compared with some other MSVM techniques, namely, ANN, One-Against-All, One-Against-One, DAGSVM, Grammer & Singer, OMSVM [46], and standard SVDD. The results concluded in the paper are all shown as average values obtained following 10-fold cross validation based on platform of Matlab 7.0.

To compare the performance of each algorithm, hit-ratio, which is defined according to the samples classified correctly, is applied. And the experiment results are listed in Table 4.

As shown in Table 4, the proposed method based on thoughts of hypersphere achieves better performance than conventional SVM models based on thoughts of hyperplane. Moreover, as one of modified models, some results obtained imply that the proposed method has better generalization ability and less computational complexity, which can be partially measured with training time labeled with “Time,” than standard SVDD.

Furthermore, as one of modified models based on standard SVDD, the proposed method accomplishes data preprocessing using KFAMC. Since the fuzzy area is determined by threshold 𝜃, greater value of 𝜃 will lead to bigger fuzzy area. Especially, when 𝜃=1, the algorithm proposed will be transformed to standard SVDD because almost all data points are positioned in fuzzy area. Hence, a model with too large threshold may be little different from standard SVDD, while a too small value will have poor ability of sphere-based classifier establishment due to lack of essential training samples. Thus, issues on choosing the appropriate threshold are discussed by empirical trials in the paper.

In the following experiment, the proposed method with various threshold values is tested based on different data sets, just as shown in Figure 8.

The results illustrated in Figure 8 showed that the proposed method achieved best performance with threshold of 0.9 based on Korea data set. But in terms of China market, it achieved best performance with the threshold of 0.8 rather than a larger one due to effects of more outliers existing in data set.

Moreover, training time of proposed method can be also compared with standard SVDD, just as illustrated in Figure 9.

Just as shown in Figure 9, results of experiments based on different data sets are similar. That is, with decline of threshold, more samples were eliminated from training set through preprocessing based on KFAMC to reduce training time. Hence, smaller values of threshold will lead to less computational consumption partly indicated as training time, while classification accuracy may be decreased due to lack of necessary training samples. Overall, threshold selection, which involves complex tradeoffs between computational consumption and classification accuracy, is essential to the proposed method.

5. Conclusions and Directions for Future Research

In the study, a novel algorithm based on Fuzzy SVDD combined with Fuzzy Clustering for credit rating is proposed. The underlying assumption of the proposed method is that sufficient boundary points could support a close boundary around the target data but too many ones might cause overfitting and poor generalization ability. In contrast to prior researches, which just applied conventional MSVM algorithms in credit ratings, the algorithm based on sphere-based classifier is introduced with samples preprocessed using fuzzy clustering algorithm.

As a result, through appropriate threshold setting, generalization performance measured by hit-ratio of the proposed method is better than that of standard SVDD, which outperformed many kinds of conventional MSVM algorithms argued in prior literatures. Moreover, as a modified sphere-based classifier, proposed method has much less computational consumption than standard SVDD.

One of the future directions is to accomplish survey studies comparing different bond-rating processes, with deeper market structure analysis also achieved. Moreover, as one of the MSVM algorithms, the proposed method can be applied in other areas besides credit ratings. And some more experiments on data sets such as UCI repository [49] are to be accomplished in the future.

Appendix

Brief Review of FCM
Bezdek-type FCM is an inner-product-induced distance-based least-squared error criterion nonlinear optimization algorithm with constrains,𝐽𝑚(𝑈,𝑃)=𝑐𝑁𝑘=1𝑛=1𝑢𝑚𝑘𝑛𝐱𝑛𝐩𝑘2𝐴,s.t.𝑈𝑀𝑓𝑐=𝑈𝑅𝐶×𝑁𝐮𝑘𝑛[]0,1,𝑛,𝑘;𝑐𝑘=1𝐮𝑘𝑛=1,𝑛;0<𝑁𝑛=1𝑢𝑘𝑛,<𝑁,𝑘(A.1) where 𝑢𝑘𝑛 is the measure of the 𝑛th sample belonging to the kth cluster and 𝑚1 is the weighting exponent. The distance between 𝐱𝑛 and the prototype of kth cluster 𝐩𝑘 is as follows: 𝑥𝑛𝑝𝑘2𝐴=𝑥𝑛𝑝𝑘𝑇𝐴𝑥𝑛𝑝𝑘.(A.2)
The above formula is also called as Mahalanobis distance, where 𝐴 is a positive matrix. When 𝐴 is a unit matrix, 𝑥𝑛𝑝𝑘2𝐴 is Euclidean distance. We denote it as 𝑥𝑛𝑝𝑘2 and adopt Euclidean distance in the rest of the paper. So, the parameters of FCM are estimated by updating min𝐽𝑚(𝑈,𝑃) according to the formulas: 𝑝𝑘=𝑁𝑛=1𝑢𝑘𝑛𝑚𝑥𝑛𝑁𝑛=1𝑢𝑘𝑛𝑚,𝑢𝑘𝑛=𝑥𝑛𝑝𝑘2/(𝑚1)𝐶𝑖=1𝑥𝑛𝑝𝑘2/(𝑚1).(A.3)

Acknowledgment

The paper was sponsored by 985-3 project of Xi’an Jiaotong University.