Research Article  Open Access
Guotai Chi, Shijie Ding, Xiankun Peng, "DataDriven Robust Credit Portfolio Optimization for Investment Decisions in P2P Lending", Mathematical Problems in Engineering, vol. 2019, Article ID 1902970, 10 pages, 2019. https://doi.org/10.1155/2019/1902970
DataDriven Robust Credit Portfolio Optimization for Investment Decisions in P2P Lending
Abstract
PeertoPeer (P2P) lending has attracted increasing attention recently. As an emerging microfinance platform, P2P lending plays roles in removing intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders. However, for the P2P lending investment, there are two major challenges, the deficiency of loans’ historical observations about the certain borrower and the ambiguity problem of estimated loans’ distribution. In order to solve the difficulties, this paper proposes a datadriven robust model of portfolio optimization with relative entropy constraints based on an “instancebased” credit risk assessment framework. The model exploits a nonparametric kernel approach to estimate P2P loans’ expected return and risk under the condition that the historical data of the same borrower is unavailable. Furthermore, we construct a robust mean–variance optimization problem based on relative entropy method for P2P loan investment decision. Using the realworld dataset from a notable P2P lending platform, Prosper, we validate the proposed model. Empirical results reveal that our model provides better investment performances than the existing model.
1. Introduction
Peertopeer lending, as an emerging online microfinance, provides services that bring borrowers and lenders together virtually and help them to lend to and borrow from each other directly. P2P lending platforms play roles in removing traditional financial intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders; therefore, they improve the efficiency of financial market. However, due to the absence of traditional financial intermediaries which can use collateral, certified accounts, and other means to enhance the creditworthiness of borrowers, the information asymmetry between borrowers and lenders severely exist and the credit risk of P2P loan investment is very high.
Credit risk of P2P lending refers to the potential monetary loss arising from the default of a borrower to a loan. Efficient and reasonable investment in P2P loans needs to be based on the reliable credit risk distribution assessment. It is very challenging to estimate the credit risk distribution of P2P loans for the difficulty of obtaining the historical returns (or losses) data of the loan waiting for investment. In other words, the historical yield data about the same borrower is usually unavailable. Moreover, even the distribution of loans’ returns (or losses) is approximated from the limited available data or the expert knowledge, the approximation is usually not accurate, and it is also known as the distribution ambiguity (probability measure uncertainty) problem. In this paper, we formulate a datadriven robust portfolio optimization model based on an “instancebased” credit risk assessment method for investment decisions in P2P lending.
To help personal lenders mitigate the risk, the current online P2P lending platforms have taken some riskreducing measures, such as filtering out the highrisk borrower whose FICO score is lower than a threshold, making a preliminary rating on each loan and providing investors with risk level of each loan. Thus, each loan is marked as a grade, like AA, A, B, C, D, E, or NR, and the loans with the same grade are considered to have the same risk level. These ratingbased models are more suitable for traditional banks and lending institutions, since they have the capability to grant large amounts of loans to diversify their investments. However, the individual investors just possess small amount of funds; they need more refined risk assessment methods and investment strategies.
Similar to bond investment, P2P investors can fund a portion, not the whole, of each loan. Therefore, investors can decide which loans to invest and, meanwhile, determine the amount of investment for each loan. This mechanism allows investors to construct a credit portfolio to mitigate risk.
Markowitz [1] proposes the famous meanvariance model, which is still widely used in portfolio selection and risk management. From then on, researchers propose a variety of meanrisk models, such as meandownside risk model [2], meanVaR model [3], meanCVaR model [4], and so on. In practice, the distribution of the assets needs to be estimated firstly, and then the optimal portfolio can be identified by the optimization model.
For P2P lending investment, as mentioned above, such procedures face at least two major challenges, i.e., the deficiency of loans’ historical observations and the ambiguity problem of estimated loans’ distribution (probability measure uncertainty problem). Thus, this paper proposes a datadriven robust model of portfolio optimization based on relative entropy constraints combined with an instancebased credit risk assessment method.
Specifically, we use the “instancebased” credit risk assessment method proposed by Guo et al. [5] to evaluate the return and risk of each loan without sufficient historical data of loans for each individual borrower. In this instancebased framework, the expected return of each loan is predicted as a weighted average of historical loans of other similar borrowers, where the optimal weights are learnt based on kernel regression. Furthermore, using the moment information (mean and variance) of the new loans, we formulate the robust portfolio optimization model with relative entropy constraints, which could obtain an optimal portfolio under the worst scenario and has the ability of reducing the potential loss caused by the uncertainty of loans distribution.
Our work is somewhat related to the paper by Guo et al. [5] and the paper by Yam et al. [6]. Guo et al. [5] introduce the instancebased framework into credit risk assessment of P2P loan and use the classical meanvariance model to obtain the optimal allocation. Yam et al. [6] derive a robust meanvariance optimization model with relative entropy constrains on the uncertainty of the interaction between the returns of different assets and discuss its mathematical and financial properties in portfolio selection. Although some other scholars have contributed novel insights into credit risk assessment of P2P lending and robust optimization, to the best of our knowledge, few have taken both into consideration synthetically. The main contribution of this paper is that we propose a datadriven robust portfolio optimization model based on relative entropy constraints combined with instancebased risk assessment framework for P2P loan investment and obtain superior performance in numerical experiments.
The rest of this paper is organized as follows. Section 2 provides the literature review. Section 3 introduces the instancebased model for credit risk assessment, as well as the mathematical framework of kernel regression approach. In Section 4, we elaborate the robust optimization model based on relative entropy method and formulate a robust meanvariance optimization model for P2P lending investment. The empirical results on the effectiveness of our model is reported in Section 5. Finally, Section 6 concludes this work.
2. Literature Review
In order to assess risk and assist investment decisions making in P2P lending, researchers have done many studies: Emekter et al. [7] explore the dominated factors that explain the funding success and credit risk and, meanwhile, measure the performance of P2P loans. They find that credit grade, debttoincome ratio, FICO score, and revolving line utilization play an important role in loan defaults; furthermore, loans with lower credit grade and longer duration may result in high mortality rate and higher interest rates charged on low credit grade borrowers are not sufficient to cover the potential loss for the higher likelihood of loan defaults. Thus, the authors suggest that investors should invest more to high grade loans. Similarly, Berkovich’s [8] study finds that high quality loans offer excess return.
The above researches investigate the factors determining the credit risk and analyze the performance of P2P loans; however, they do not propose a mechanism which assist individual investors in allocating loans effectively and making optimal investment decisions.
To help personal lenders mitigate the risk, the popular online P2P platforms, like Lending Club and Prosper, have developed credit scoring systems to assess the creditworthiness of each borrower based on data mining or machine learning techniques. There is a large body of existing literatures concerned with credit rating using data mining techniques, for example, linear discriminate analysis (LDA) [9], knearest neighbors [10], logistic regression [11], classification and regression trees (CART) [12], Markov chains [13], survival analysis [14], artificial neural network (ANN) [15], genetic methods [16], support vector machine (SVM) [17, 18], lassoprobit [19], and so on.
In the portfolio selection problem, full knowledge of the assets’ distribution is usually assumed to determine the optimal portfolio. In most reallife applications, we need to approximate the assets’ distribution. However, the approximations are not necessarily accurate, and it is known as the distribution ambiguity (probability measure uncertainty) problem.
The robust optimization algorithm is an attractive way to solve the portfolio selection problem under distribution ambiguity. As the exact parameters are unavailable, Natarajan et al. [20] use a set of parameters (which represent different distributions or scenarios) rather than a point estimation of the parameters to formulate the asset allocation problem. Following this idea, there are different ways to model ambiguity by using a set of parameters. Chen et al. [21] take the lower partial moments and CVaR as two risk measures and consider a tight bound which are likely to cover the possible parameters. Epstein [22] considered intervals that may include the actual parameters. Natarajan et al. [23] use a piecewiselinear concave utility function to derive accurate and estimated optimal strategies for the expected utility model in the portfolio optimization issue under the worstcase scenarios. Paç and Pinar [24] use an ellipsoidal uncertainty set to represent the distribution ambiguity to identify the optimal portfolio.
Since relative entropy has the ability to measure the difference between two probability distributions (probability measures), it can be used to construct the uncertainty set for robust optimization. In the studies of Hansen and Sargent [25] and Calafiore [26], relative entropy is used to model uncertainty and obtain the optimal investment decision. Yam et al. [6] derive a robust meanvariance optimization model with relative entropy constrains on the uncertainty of the interaction between the returns of different assets and discuss its mathematical and financial properties in portfolio selection.
In recent years, research on datadriven methods has been well studied. In this framework, it is assumed that investors only possess the information about history data of asset return. Bertsimas et al. [27] use KS test, χ^{2} test, AndersonDarling test, and some other testing tools to construct uncertainty sets and take the worst case of each set to formulate the robust optimization. They assume that the uncertainty sets are defined by certain structures and sizes based on the data points available. While the structure of uncertainty set in our study is not predefined, we consider the uncertainty of mean, covariance, and distribution synthetically. Kang et al. [28] propose a datadriven robust meanCVaR portfolio selection model under the condition of distribution ambiguity and adopt a nonparametric bootstrap approach to calibrate the levels of ambiguity. Their work is based on the meanCVaR framework with data of stock indices, while our work is based on the meanvariance framework with data of P2P loans.
3. InstanceBased Model for Credit Risk Assessment
Using historical data to evaluate future performance and potential loss is a convention. However, unlike bonds or stocks investment, the historical yield data about the same P2P borrower is usually unavailable. Thus, the risk assessment of new loan is very challenging. In this section, we briefly introduce the instancebased credit risk assessment model proposed by Guo et al. [5].
3.1. InstanceBased Assessment Framework
In this instancebased assessment framework, the expected return of each loan is estimated as a weighted average of historical observations of other borrowers’ closed loans. Specifically, for a new loan i, using n past loans, each with an historical return (j = 1, 2,..., n), we can calculate the expected retrun of loan i, , based on a weighted average of past loans’ actual returns:where denotes the weight of loan j for predicting the expected retrun of loan i. The weight depends on the similarity between loan i and loan j. Intuitively, the more the similarity, the greater the weight. The calculation of the weight will be introduced in Section 3.2.
The weighted returns of the past loans are assumed as historical observations of a new loan. According to this line of thought, taking variance as the risk measure, weighted variance of past loans are used to assess the new loan’s risk, that is,where , , and have the same meanings as (1).
The absolute deviation between two loans’ default probabilities is used to measure the similarity; the smaller the absolute deviation, the more the similarity, and, therefore, the larger the weight. In particular, absolute deviation of default probabilities between loans i and j is defined as follows: d_{ij} = p_{i}  p_{j}, where p_{i} and p_{j} are the default probabilities of loans i and j, respectively. Kernel regression is exploited to investigate the nonlinear relationship between the absolute deviation and the weight. This process will be introduced in the next subsection.
3.2. Kernel Regression of Return and Risk
Kernel regression is a nonparameter statistical method to investigate the nonlinear relation between random variables, which is based on the kernel density estimation. First of all, the preliminaries of kernel estimation are introduced.
Given n realizations z_{j}, j = 1,..., n, of random variable z, the kernel estimation of the probability density function p(z) is defined bywhere K(·) is a kernel function and h is a smoothing parameter.
Kernel function K(·) is nonnegative and bounded and, meanwhile, satisfies the following properties:
(a) ; (b) ; (c) .
There are a range of commonly used kernel functions, such as uniform, triangular, biweight, triweight, and Gaussian [29]. Because the kernel estimation is insensitive to the choice of kernel function, we use the Gaussian kernel function due to its convenient mathematical properties, which is written as .
The smoothing parameter h=h(n) is also called the bandwidth that depends on the sample size n. Specifically, h(n) and n·h(n) decrease to 0 as n tend to ∞.
Many literatures reveal that the choice of kernel function does not affect the estimation significantly; however, the choice of the bandwidth is a vital issue [30, 31]. The determination of the bandwidth will be shown in detail in Section 5.3.
In the following, we introduce the kernel regression model proposed by Nadaraya [32]. Theoretically, we assume that each observation is denoted as (X, Y) which is a random vector R^{2}valued. With the sample set, x_{j}, y_{j}) j = 1, 2,..., , the kernel estimator of the target y given its predictive observation x is defined aswhere K(·) is a kernel function and h is the bandwidth.
For the instancebased credit risk modeling, the set of historical observations is represented as p_{j}, R_{j}) j = 1, 2,..., , where p_{j} and R_{j} are the default probability and return rate of the jth loan, respectively. Thereby, the estimation of the ith loan’s return could be written as Note that the determination of loans’ default probability will be introduced in Section 5.1.
Comparing (1) to (5), we can represent the optimal weight as Using the optimal weight and the expected return derived from (5), (2) can be rewritten as
4. Robust Investment Decision Model
Similar to bond investment, P2P lenders can invest a portion of each loan. Thus, P2P loan investment decisions can be transformed into a credit portfolio optimization problem. This section introduces the portfolio optimization model for investment decisions in P2P lending, which accounts for the uncertainty of the distribution of the loans. We start from the classical meanvariance optimization model proposed by Markowitz [1] to its tractable robust counterpart.
4.1. Robust Optimization Model Based on Relative Entropy Constraints
In the classical meanvariance optimization model, the optimal asset allocation strategy is identified by solving the tradeoff between risk and return according to investors’ risk preference. A portfolio that invests in n assets is represented as a vector of weights, λ∈ R^{n}, where each weight denotes the proportion of wealth allocated to an asset. Then the return and risk of the portfolio become and , respectively, where μ∈ R^{n} and V∈ R^{n×n} are the expected return and the covariance matrix of the assets’ returns under the probability measure (or probability distribution) P, respectively. Here, P represents the ideal estimated market condition where μ and V estimated by using all available information, including historical observations, news, expert knowledge, and so on, are assumed as the actual expected return and covariance matrix. Thus, the classical meanvariance portfolio selection problem (MV) can be formulated as where Ω ⊆ R^{n} denotes the set of feasible portfolios and is the required return rate specified by the investor.
In reality, the assumption that the expected return μ and covariance matrix V are known with certainty is less reasonable. It is quite possible that the estimated parameters are different with the actual ones. Thus, the optimal portfolio identified by using the estimated inputs parameters μ and V directly may be inappropriate. Robust optimization seeks for portfolios that are insensitive to the uncertain in the parameters and the solutions that must be feasible no matter what the actual value of the parameters is.
The investors might consider a set of probability measures, i.e., an uncertainty set, to cover a range of scenarios based on their assessments, and then use robust optimization to obtain approximate optimal strategies for the worst scenarios within the uncertainty set. In this paper, we define as the set of probability measures representing the possible scenarios, and as the expected return and covariance matrix estimated under the probability measure . Mathematically, the robust counterpart of the classical meanvariance optimization problem (RMV) can be written as It is rational to assume that the actual value of the parameters is in the neighborhood of the estimator. Thus, we can generate the uncertainty set based on the assumption that the measures in the set should be not far from the ideal measure P. Relative entropy, also known as the Kullback–Leibler divergence, can be used to measure the difference between probability measures. The relative entropy of the measure in with respect to the measure P iswhere and are the probability density functions (pdf) of the loans’ returns under probability measures P and , respectively. In the context of meanvariance analysis, relative entropy can be rewritten as where , V, , and carry the same meaning as in (8) and (9); tr(V), , and V be the trace, the determinant, and the transpose of V, respectively; n is the amount of assets in the portfolio.
Let denote the set of parameters (, ) under the measure Q in . Using the constraint of relative entropy, we can rewrite the robust optimization model, (9), aswhere K is a positive constant and determines the size of uncertainty set. Parameter K measures the level of uncertainty and reflects the investors’ confidence in and V estimated under probability measure P, i.e., the greater K’s value, the less confidence.
Yam et al. [6] prove that the robust meanvariance portfolio selection model based on relative entropy method (RMVRE) can be formulated as quadratic optimization problem, which is a tractable formulation and can be efficiently solved. That is,Herein, =ζμ, V=V+ζ(1ζ)μ and is related to K in (12) closely, which reflects the level of confidence in μ and V estimated under measure P. For example, ζ=1 means that investors believe the estimated μ and V are the true parameters. And as ζ decreases, the investor’s confidence is weaker. The details of the proof are referred to by Yam et al. [6].
4.2. Robust MeanVariance Portfolio Optimization Model in P2P Lending
In the Section 3.2, we estimated each loan’s expected return and variance of return, i.e., and , using the instancebased credit risk assessment model. Let and denote the expected return vector and the covariance matrix of the loans’ returns under the probability measure P. Here, we assume that the correlation between P2P loans is negligible. Now we can rewrite (13) as The feasible region Ω of our problem is defined by the following constraints:(1)The value of the portfolio remains at its initial value, i.e., .(2)Shortselling is forbidden; thus .(3)For each loan, the amount that lender can invest is no more than the borrower request, m_{i}; thereby, M ≤ m_{i}, where M is the total investment amount and investor has available.
5. Empirical Analysis
In this section, we investigate the validity of the robust meanvariance portfolio optimization model in P2P lending using the realworld dataset from a notable P2P lending platform, Prosper. All numerical experiments are performed by using MATLAB on PC.
5.1. Data Description and Preprocess
The dataset for empirical study is from a notable P2P lending platform in the United States, Prosper. It consists of 17,001 loans including 3039 default loans and 13908 completed loans, whose issue dates within the period from November 2005 to March 2014.
Using the data, a credit scoring model is learnt to transform the loan attributes into the default probability. The loan attributes are as follows: the borrower’s FICO score which reflects borrower’s creditworthiness, the borrower’s number of inquiries in the past six months, the monetary amount of the loan, the homeownership status of the borrower, the debttoincome ratio of the borrower, the borrower’s current delinquencies representing the number of accounts delinquent, and the borrower’s number of public records in the past 10 years (Row 17 in Table 1). The target variable is a binary variable (0 represents completed and 1 represents default), as described in Row 8 of Table 1.

There exist many credit scoring models to predict the default probability of a loan, such as: Xgboost model [33–35], hybrid KMV model [36], credit scoring based on genetic algorithms [37, 38], and so on. However, discussing how to choose and construct the optimal credit scoring model is beyond the scope of this study, and we use the most popular model, logistic regression, to make the prediction in this preprocessing step.
We randomly divide the dataset into two parts, one containing 40% of all loans for determining the optimal bandwidth h in (5), which will be described in detail in Section 5.3 and the second part containing 60% of the loans. Moreover, using kfold crossvalidation, we randomly divide the second part into 20 subsets, each of which contains approximately 510 loans. In each round, one of the subsets is used as the testing set which consists of loans waiting to be invested, thus their payback statuses are unknown, and all other subsets are taken as a training set which consists of historical loans with known yield.
5.2. Model Description
In this paper, we propose a robust credit portfolio optimization model for investment decisions in P2P lending. In order to show its effectiveness, we compare it with a benchmark model proposed by Guo et al. [5]. In the following, we describe models in detail.
IOM is the instancebased model proposed by Guo et al. [5]. Each loan is assessed using kernel weights and the historical performance of similar loans. Then use the classical meanvariance model (8), to identify the optimal allocation strategy. The performance of this model outperforms some ratingbased models, as the results of Guo et al. [5] show.
RIOM is the robust instancebased model in this study. Expected return and risk of each loan are also assessed based on the “instancebased” assessment framework. However, we use the robust model of credit portfolio optimization based on relative entropy method, Equation (15), to obtain the optimal investment decision.
We compare the two models by the following procedure:(1)Train the credit risk assessment model with the training set, and use the trained model to predict the expected return () and variance () of each loan in the testing set. Thus, the expected return vector and the covariance matrix, μ and V, can be obtained.(2)For each model, feed the predicted expected return vector μ and the covariance matrix of the testing loans into the portfolio optimization algorithm, and compute the performance of investment on the optimal portfolio.(3)Compare the return rate of the two models.
5.3. Analysis of Results
As mentioned before, we select the Gaussian kernel, , as the kernel function. And the important parameter in the kernel regression model, bandwidth h, is optimized by the following leaveoneout cross validation:where is the leaveoneout estimation of expected return rate , specifically,The curve of CV(h) is exhibited in Figure 1. The shape of the curve clearly shows a minimal point and h corresponding to the minimal point is the optimal bandwith for the model.
To apply the robust credit portfolio optimization method to obtain the optimal investment strategy in problems (13), we select the parameter ζ=0.75, the investment amount M = 15 thousand dollars, and the required rate of return = 0.05. We also set the riskfree return rate as 0.025, which is about equivalent to the average yield of TBills over the same period. And we use the MATLAB builtin solver “quaprog” to solve the two portfolio optimization problems.
Table 2 summarizes investment return rate of each test subset and the average performance of the Prosper dataset. It shows that the two portfolios are almost always efficient and feasible, except subset 16. The results also show that the actual performances of the optimal portfolio derived from RIOM always outperform the optimal portfolio from IOM. And the Sharpe ratio shows that medianbased optimal portfolio performs better as well.

In order to test and verify that the conclusions obtained from the above experiments are stable, we consider different investment amounts and required returns as input parameters for portfolio selection and keep other conditions unchanged. As summarized in Table 3, we consider nine parameters pairs about required return rate and investment amount M.

The computational results for each parameters pair are summarized in Table 4. Table 4 shows performance comparison of the two optimal portfolios from the perspectives of actual return rate of investment. The more intuitive results are shown in Figure 2, which shows the actual return rate comparison of the two models. The first 9 numbers of the horizontal axis in Figure 2 represent the corresponding parameters combinations (sets 1 through 9 from Table 3), and the number 10 shows the average. We can find that the RIOM model outperforms the IOM model comprehensively.

In conclusion, the optimal portfolio identified from the robust optimization model in this study is more efficient than the existing model. And the performance of our model is more robust and stable.
6. Conclusions
In this paper, we formulate a datadriven robust model of portfolio optimization with relative entropy constraints based on an instancebased credit risk assessment framework for investment decisions in P2P lending. This P2P lending investment decision model has at least three advantages. Firstly, it provides a more refined measure of P2P loans’ risk and reveals a more intuitive and quantized risk estimate to investors, instead of just labelling each loan with a credit grade. Secondly, this model can estimate each loan’s expected return and risk when the historical observation of the same borrower is unavailable. Finally, this model considers the loans’ distribution ambiguity (probability measure uncertainty) problem and uses relative entropy to model parameter uncertainty to ensure the optimal allocation strategy efficient and feasible under various actual scenarios. Numerical experiments imply that the P2P lending investment decision model using the robust optimization with relative entropy constraints provides better performance than existing model.
Data Availability
The data this paper used is downloaded from the website of Prosper: https://www.prosper.com/invest/download.aspx.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.”
Acknowledgments
The research is supported by the National Natural Science Foundation of China (Grants nos. 71471027, 71731003, and 71873103), the National Social Science Foundation of China (Grant no. 16BTJ017), National Natural Science Foundation of China Youth Project (Grant no. 71601041), Liaoning Economic and Social Development Key Issues (Grant no. 2015lslktzdian05), and Liaoning Provincial Social Science Planning Fund Project (Grant no. L16BJY016). The authors acknowledge the organizations mentioned above.
References
 H. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952. View at: Publisher Site  Google Scholar
 H. M. Markowitz, Portfolio Selection: Efficient Diversication of Investment, Wiley, New York, NY, USA, 1959. View at: MathSciNet
 N. Larsen, H. Mausser, and S. Uryasev, “Algorithms for optimization of ValueatRisk,” in Financial Engineering, ECommerce and Supply Chain, Applied Optimization, P. M. Pardalos and V. K. Tsitsiringos, Eds., vol. 70, Kluwer Academic Publishers, Dordrecht, 2002. View at: Google Scholar
 R. T. Rockafellar and S. Uryasev, “Conditional valueatrisk for general loss distributions,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1443–1471, 2002. View at: Publisher Site  Google Scholar
 Y. H. Guo, W. J. Zhou, C. Y. Luo, C. R. Liu, and H. Xiong, “Instancebased credit risk assessment for investment decisions in P2P Lending,” European Journal of Operational Research, vol. 249, no. 2, pp. 417–426, 2016. View at: Publisher Site  Google Scholar  MathSciNet
 S. C. P. Yam, H. Yang, and F. L. Yuen, “Optimal asset allocation: Risk and information uncertainty,” European Journal of Operational Research, vol. 251, no. 2, pp. 554–561, 2016. View at: Publisher Site  Google Scholar
 R. Emekter, Y. Tu, B. Jirasakuldech, and M. Lu, “Evaluating credit risk and loan performance in online PeertoPeer (P2P) lending,” Applied Economics, vol. 47, no. 1, pp. 54–70, 2014. View at: Publisher Site  Google Scholar
 E. Berkovich, “Search and herding effects in peertopeer lending: evidence from prosper.com,” Annals of Finance, vol. 7, no. 3, pp. 389–405, 2011. View at: Publisher Site  Google Scholar
 E. I. Altman, “Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,” The Journal of Finance, vol. 23, no. 4, pp. 589–609, 1968. View at: Publisher Site  Google Scholar
 S. Chatterjee and S. Barcun, “A nonparametric approach to credit screening,” Publications of the American Statistical Association, vol. 65, no. 329, pp. 150–154, 1970. View at: Publisher Site  Google Scholar
 J. C. Wigintor, “A note on the comparison of logit and discriminant models of consumer credit behavior,” Journal of Financial and Quantitative Analysis, vol. 15, no. 3, pp. 757–770, 1980. View at: Publisher Site  Google Scholar
 L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Wadsworth, Belmont, Calif, USA, 1983.
 M. M. So and L. C. Thomas, “Modelling the profitability of credit cards by Markov decision processes,” European Journal of Operational Research, vol. 212, no. 1, pp. 123–130, 2011. View at: Publisher Site  Google Scholar
 G. Andreeva, J. Ansell, and J. Crook, “Modelling profitability using survival combination scores,” European Journal of Operational Research, vol. 183, no. 3, pp. 1537–1549, 2007. View at: Publisher Site  Google Scholar
 D. West, “Neural network credit scoring models,” Computers & Operations Research, vol. 27, pp. 1131–1152, 2000. View at: Publisher Site  Google Scholar
 J. J. Huang, G. H. Tzeng, and C. S. Ong, “Twostage genetic programming (2SGP) for the credit scoring model,” Applied Mathematics and Computation, vol. 174, no. 2, pp. 1039–1053, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 C. L. Huang, M. C. Chen, and C. J. Wang, “Credit scoring with a data mining approach based on support vector machines,” Expert Systems with Applications, vol. 33, no. 4, pp. 847–856, 2007. View at: Publisher Site  Google Scholar
 P. Danenas and G. Garsva, “Selection of support vector machines based classifiers for credit risk domain,” Expert Systems with Applications, vol. 42, no. 6, pp. 3194–3204, 2015. View at: Publisher Site  Google Scholar
 G. Sermpinis, S. Tsoukas, and P. Zhang, “Modelling market implied ratings using LASSO variable selection techniques,” Journal of Empirical Finance, vol. 48, pp. 19–35, 2018. View at: Google Scholar
 K. Natarajan, D. Pachamanova, and M. Sim, “Constructing risk measures from uncertainty sets,” Operations Research, vol. 57, no. 5, pp. 1129–1141, 2009. View at: Publisher Site  Google Scholar
 L. Chen, S. He, and S. Zhang, “Tight bounds for some risk measures, with applications to robust portfolio selection,” Operations Research, vol. 59, no. 4, pp. 847–865, 2011. View at: Publisher Site  Google Scholar  MathSciNet
 L. G. Epstein, “A paradox for the "smooth ambiguity"' model of preference,” Econometrica, vol. 78, no. 6, pp. 2085–2099, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 K. Natarajan, M. Sim, and J. Uichanco, “Tractable robust expected utility and risk models for portfolio optimization,” Mathematical Finance, vol. 20, no. 4, pp. 695–731, 2010. View at: Publisher Site  Google Scholar
 A. B. Paç and M. Ç. Pınar, “Robust portfolio choice with CVaR and VaR under distribution and mean return ambiguity,” TOP, vol. 22, no. 3, pp. 875–891, 2014. View at: Publisher Site  Google Scholar
 L. P. Hansen and T. J. Sargent, “Robust control and model uncertainty,” The American Economic Review, vol. 91, no. 2, pp. 60–66, 2001. View at: Google Scholar
 G. C. Calafiore, “Ambiguous risk measures and optimal robust portfolios,” Society for Industrial and Applied Mathematics, vol. 18, no. 3, pp. 853–877, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 D. Bertsimas, V. Gupta, and N. Kallus, “Datadriven robust optimization,” Mathematical Programming, vol. 167, no. 2, pp. 235–292, 2018. View at: Publisher Site  Google Scholar
 Z. Kang, X. Li, Z. Li, and S. Zhu, “Datadriven robust meanCVaR portfolio selection under distribution ambiguity,” Quantitative Finance, pp. 1–17, 2018. View at: Google Scholar
 Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice, Princeton University Press, 2007. View at: MathSciNet
 O. Scaillet, “Nonparametric estimation and sensitivity analysis of expected shortfall,” Mathematical Finance, vol. 14, no. 1, pp. 115–129, 2004. View at: Publisher Site  Google Scholar
 H. Yao, Z. Li, and Y. Lai, “Mean–CVaR portfolio selection: A nonparametric estimation framework,” Computers & Operations Research, vol. 40, no. 4, pp. 1014–1022, 2013. View at: Publisher Site  Google Scholar
 E. A. Nadaraja, “On nonparametric estimates of density functions and regression,” Theory of Probability & Its Applications, vol. 10, no. 1, pp. 186–190, 1965. View at: Google Scholar  MathSciNet
 T. Chen and T. He, “Higgs boson discovery with boosted trees,” in Proceedings of the NIPS 2014 Workshop on Highenergy Physics and Machine Learning, pp. 69–80, 2015. View at: Google Scholar
 Y. Xia, C. Liu, Y. Li, and N. Liu, “A boosted decision tree approach using Bayesian hyperparameter optimization for credit scoring,” Expert Systems with Applications, vol. 78, pp. 225–241, 2017. View at: Publisher Site  Google Scholar
 H. He, W. Zhang, and S. Zhang, “A novel ensemble method for credit scoring: Adaption of different imbalance ratios,” Expert Systems with Applications, vol. 98, pp. 105–117, 2018. View at: Publisher Site  Google Scholar
 C.C. Yeh, F. Lin, and C.Y. Hsu, “A hybrid KMV model, random forests and rough set theory approach for credit rating,” KnowledgeBased Systems, vol. 33, no. 3, pp. 166–172, 2012. View at: Publisher Site  Google Scholar
 S. Oreski, D. Oreski, and G. Oreski, “Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment,” Expert Systems with Applications, vol. 39, no. 16, pp. 12605–12617, 2012. View at: Publisher Site  Google Scholar
 V. Kozeny, “Genetic algorithms for credit scoring: Alternative fitness function performance comparison,” Expert Systems with Applications, vol. 42, no. 6, pp. 2998–3004, 2015. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2019 Guotai Chi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.